Sentiment Analysis on Twitter

The Case of Italian Language

Valerio Basile
28/8/2015

> whoami

Wimmics, INRIA

PhD at Groningen, NL

MSc at Bologna, Italy

Roadmap

  1. A collection of Italian tweets.
  2. Sentiment Analysis shared task.
  3. Beyond message-level Sentiment Analysis.

TWITA


Collection of 3+ years of Twitter messages in Italian language.

More than half a billion 784,211,691 tweets, and counting...

Seeds:
vita, Roma, forza, alla, quanto, amore, Milano, Italia, fare, grazie, della, anche, periodo, bene, scuola, dopo, tutto, ancora, tutti, fatto il è come

Second step: langid

 {"contributors": null, "truncated": false,
  "text": "\"Si nasce in un posto, si prende una barca per arrivare dove
  poter nascere ancora, \nsi mettono fiori\u2026
  http://t.co/Cgn1nfOWVZ", "in_reply_to_status_id": null, "id":
  451271120575283200, "favorite_count": 0, "source":
  "Instagram",
  "retweeted": false, "coordinates": null, "entities": {"symbols": [],
  "user_mentions": [], "hashtags": [], "urls": [{"url":
  "http://t.co/Cgn1nfOWVZ", "indices": [102, 124], "expanded_url":
  "http://instagram.com/p/mR8JJ1C6V7/", "display_url":
  "instagram.com/p/mR8JJ1C6V7/"}]}, "in_reply_to_screen_name": null,
  "id_str": "451271120575283200", "retweet_count": 0,
  "in_reply_to_user_id": null, "favorited": false, "user":
  {"follow_request_sent": null, "profile_use_background_image": false,
  "default_profile_image": false, "id": 276029690,
  "profile_background_image_url_https":
  "https://pbs.twimg.com/profile_background_images/774200821/b29332b24409bd98f380fb6356a6e5f7.jpeg",
  "verified": false, "profile_image_url_https":
  "https://pbs.twimg.com/profile_images/2653103820/4daecc232d0c1f50986717f1da2d4465_normal.jpeg",
  "profile_sidebar_fill_color": "EFEFEF", "profile_text_color":
  "333333", "followers_count": 362, "profile_sidebar_border_color":
  "000000", "id_str": "276029690", "profile_background_color": "DE1D1D",
  "listed_count": 2, "is_translation_enabled": false, "utc_offset":
  7200, "statuses_count": 4071, "description": "profilo ufficiale di
  maurizio montanino \r\namo la vita in ogni sua forma d'espressione, la
  musica. cantare: rap/pop/electro funk.  scrivere poesie.",
  "friends_count": 729, "location": "genova-italy/on the road",
  "profile_link_color": "579900", "profile_image_url":
  "http://pbs.twimg.com/profile_images/2653103820/4daecc232d0c1f50986717f1da2d4465_normal.jpeg",
  "following": null, "geo_enabled": false, "profile_banner_url":
  "https://pbs.twimg.com/profile_banners/276029690/1358196165",
  "profile_background_image_url":
  "http://pbs.twimg.com/profile_background_images/774200821/b29332b24409bd98f380fb6356a6e5f7.jpeg",
  "name": "maurizio montanino", "lang": "it", "profile_background_tile":
  true, "favourites_count": 1747, "screen_name": "maui974",
  "notifications": null, "url": "http://BLOG.libero.it/MUSIConBLOG",
  "created_at": "Sat Apr 02 14:11:55 +0000 2011",
  "contributors_enabled": false, "time_zone": "Rome", "protected":
  false, "default_profile": false, "is_translator": false}, "geo": null,
  "in_reply_to_user_id_str": null, "possibly_sensitive": false, "lang":
  "it", "created_at": "Wed Apr 02 08:13:14 +0000 2014", "filter_level":
  "medium", "in_reply_to_status_id_str": null, "place": null}
            

Rich structure.

Sentiment Analysis on TWITA

Based on words and synsets

MultiWordNet ↔ WordNet ↔ SentiWordNet

buono	a	00631391	0.625	0	1.0	0.625
buono	a	00633410	0.625	0.125	0.748668167244	0.637377439199
buono	a	01123148	0.75	0	1.0	0.75
buono	a	01129977	1	0	1.0	1.0
buono	a	01372049	0.625	0	1.0	0.625
buono	a	01800349	0.625	0.125	0.748668167244	0.637377439199
buono	a	01983162	1	0	1.0	1.0
buono	n	04849241	0.875	0	1.0	0.875
buono	n	05142180	0.625	0	1.0	0.625
buono	n	06518068	0.125	0	1.0	0.125
buono	n	10138767	0.375	0	1.0	0.375
buono	n	10138767	0.375	0	1.0	0.375
cattivo	a	01125429	0	0.625	-1.0	0.625
cattivo	a	01131043	0	0.875	-1.0	0.875
cattivo	n	09831856	0	0.25	-1.0	0.25
cattivo	n	09831962	0	0.125	-1.0	0.125
cattivo	n	10753546	0	0.375	-1.0	0.375
cattivo	n	10753546	0	0.375	-1.0	0.375
cattivo	n	10753546	0	0.375	-1.0	0.375
cattivo	n	10753779	0	0.25	-1.0	0.25
cattivo	n	10753779	0	0.25	-1.0	0.25
              

part-of-speech lemmas synsets
noun 52,257 12,228
adjective 3,359 1,805
verb 2,775 1,260
adverb 1,351 750
all 59,742 16,043

Ho fatto un casino, devo rimediare
I screwed up, I have to fix

fare		v	00322847	0.125	0	1.0	0.125
fare		v	01733477	0.125	0	1.0	0.125
fare		v	01060494	0.25	0	1.0	0.25
fare		v	01759326	0.25	0.25	0.0	0.353553390593
fare		v	02109045	0.25	0	1.0	0.25
casino		n	00075618	0	0.5	-1.0	0.5
casino		n	05687338	0	0.25	-1.0	0.25
casino		n	07390945	0	0.25	-1.0	0.25
casino		n	13777509	0	0.25	-1.0	0.25
casino		n	14500047	0	0.125	-1.0	0.125
casino		n	14500047	0	0.125	-1.0	0.125
dovere		v	00065370	0	0.75	-1.0	0.75
dovere		v	00065370	0	0.75	-1.0	0.75
dovere		v	01189113	0	0.25	-1.0	0.25
rimediare	v	02519991	0	0.125	-1.0	0.125
              

Ho fatto un casino, devo rimediare
I screwed up, I have to fix

fare: 0.2/0.05
casino: 0/0.25
dovere: 0/0.58
rimediare: 0/0.125

TOTAL: 0.2/1.01 → Negative

Experiments on a 2,000 tweets gold standard: three-way classification.

All combinations of POS-tags + polypathy filter.

Best accuracy: 55.4% (51.4% on topic-specific set).

Adverbs only + polypathy strategy wins.

V. Basile and M. Nissim
Sentiment Analysis on Italian Tweets
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Sentiment Polarity Classification task

Shared Task at EVALITA 2014

3 subtasks, 2 runs

Subtask 1

Subjectivity

subjective/objective

Subtask 2

Polarity

neutral/positive/negative/mixed

Subtask 3

Irony (pilot task)

ironic/not ironic

subj pos neg iro interpretation
0 0 0 0 objective
1 0 0 0 subjective
1 1 0 0 positive
1 0 1 0 negative
1 1 1 0 mixed polarity
1 1 0 1 positively ironic
1 0 1 1 negative ironic

Gold standard

2 sources: SentiTUT (Turin), TWITA (Groningen)

2 subsets: general, topic-specific (politics)

4,513 (training set) + 1,935 (test set) = 6,448 tweets

Difficult task for humans! (k=0.35 for Task 3)

F-score

team task 1 task 2 task 3
const. uncost. const. uncost. const. uncost.
uniba2930 0.714 0.689 0.677 0.663
UNITOR 0.687 0.689 0.629 0.654 0.575 0.595
IRADABE 0.670 0.646 0.634 0.618 0.541 0.551
UPFtaln 0.649 0.604 0.468
ficlit+cs@unibo 0.597 0.598
mind 0.590 0.534 0.477
SVMSLU 0.582 0.602 0.539
fbkshelldkm 0.559 0.562 0.470
itagetaruns 0.522 0.518 0.492
Itanlp-wafi 0.663*
baseline 0.400 0.371

V. Basile, A. Bolioli, M. Nissim, V. Patti and P. Rosso
Overview of the Evalita 2014 SENTIment POLarity Classification Task
Proceedings of the 4th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'14)

Beyond Message-Level Sentiment Analysis

Entity Linking

il profilo nuovo fa schifo.tutto quel bianco mi fa vomitare.ed รจ la copia di facebook.ma *****o.

Aspect-based Sentiment Analysis

Tristezza, quando vedi ancora gente che odia i tedeschi per il loro "passato". Credevo che l'ignoranza e i pregiudizi ormai fossero estinti

Beyond irony

Figurative language: metaphore, simile, sarcasm, ...

subj pos neg iro opos subj description
0 0 0 0 0 0 objective tweet
1 1 0 0 1 0 subjective tweet, positive polarity, no irony
1 0 0 0 0 0 subjective tweet, neutral polarity, no irony
1 0 1 0 0 1 subjective tweet, negative polarity, no irony
1 0 1 1 1 0 subjective tweet, negative literal polarity, positive overall polarity
1 1 0 1 0 1 subjective tweet, positive literal polarity, negative overall polarity

P. Basile, V Basile, M. Nissim and N. Novielli
Deep Tweets: from Entity Linking to Sentiment Analysis
(unpublished)

Grazie!