Affective lexica and other resources for Italian
An affective lexicon is a database of words (or word senses, phrases, or other kinds of lexical items) where each item is classified according to its content in terms of subjectivity, polarity (potisive or negative), capability of evoking specific emotions and so on. Such resources are used to build automatic systems that analyze natural language (for example, from websites or social media), and “read” the sentiment expressed in the text. This activity is called Sentiment Analysis (or Opinion Mining) and it is gaining more and more attention from the scientific communities as well as industry, because it can answer questions like “are customers happy with product X?” or “what type of people approve policy Y?”.
Italian is a somewhat poorly represented language in the panorama of language resources. This is true for affective lexica too, but thanks to a vibrant community, things are rapidly changing. I conducted a quick survey, by asking on the mailing list of the Italian Association for Computational Linguistics about affective lexica for Italian. I received many replies, that I compiled in the list below. Some of them are lexica, some are other kinds of resources and methods, either in the Italian language or somehow linked to the Italian NLP community.
-
Sentix Affective lexicon, automatically build by aligning MultiWordNet, WordNet and SentiWordNet. Each sense is given scores for positive polarity, negative polarity and intensity. Available at [http://valeriobasile.github.io/twita/downloads.html][http://valeriobasile.github.io/twita/downloads.html]. Publication: V. Basile and M. Nissim (WASSA 2013).
-
Lexicon created semi-automatically for the participation to the EVALITA 2014 shared task SENTIPOLC. Described in Di Gennaro, Rossi e Tamburini (EVALITA 2014).
-
Sentiment lexicon developed semi-automatically for the [Opener project]. It contains 24.293 lexical entries labeled with positive/neutral/negative polarity. Available at https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-73.*
-
Proprietary sentiment lexicon containing single words, multiword expressions and idiomatic expressions, annotated with polarity, intensity, emotions and domain distributed by CELI under commercial licence. Described in A. Bolioli, F. Salamino, V. Porzionato (ESSEM 2013).
-
Polarized word embeddings can be created with the technique described in G. Attardi (IIR 2015) and implemented in DeepNL.
-
Database of affective norms for Italian developed for the INCREASE project. Available at https://sites.google.com/view/mariamontefinese/norms-data?authuser=0 (other affective and semantic resources are available on the same Web page). Described in Montefinese, M., Ambrosini, E., Fairfield, B. et al. Behav Res (2014).
-
Automatic method to build multilingual opinionated lexicons based on distant supervision. Used for the participation to the EVALITA 2016 shared task SENTIPOLC. Dictionaries in English and Italian are available at http://sag.art.uniroma2.it/demo-software/distributional-polarity-lexicon/. Described in G. Castellucci, D. Croce, R. Basili (2016) and G. Castellucci, D. Croce, R. Basili (2015).
-
SentiWords High coverage resource containing roughly 155.000 English words associated with a sentiment score included between -1 and 1. Available at http://hlt-nlp.fbk.eu/technologies/sentiwords. Described in Gatti L., Guerini M. & Turchi M. (2015).
-
SentIta and Doxa italian databases and tools for sentiment analysis.
-
Affective lexicon developed for EVALITA 2014: UNIBA at EVALITA 2014-SENTIPOLC Task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features. Described in P. Basile, N. Novielli (Proceedings of EVALITA, 2014).
A big thank you to all the contributors. If you know of other resources that would fit the list above, feel free to contact me, I’ll be happy to update the list.