Works

Here are some of the things I did, or I was involved in somehow, and software I maintain.

UnToLD

Unsupervised Topic-based Lexical Debias

Website

Hurtlex

Hurtlex is a multilingual lexicon of offensive, aggressive and hateful words. It was created starting from an original resource by linguist Tullio De Mauro and semi-automatically translated into many languages.

Hurtlex repository

KNEWS

KNEWS (Knowledge Extraction With Semantics) is a software that bridges semantic parsing, word sense disambiguation, and entity linking to produce a unified, LOD-compliant abstract representation of meaning.

KNEWS source code

KNEWS demo

TWITA

The collection of tweets from Twitter in Italian language.

Link

C&C API

HTTP RESTful API to analyze English natural language using the C&C tools and Boxer.

Link

Delicious Folksonomy Dataset

A dataset obtained crawling Delicious, the social bookmarking website.

Link

C&C/Boxer Web Interface

Web interface for the C&C/Boxer linguistic analysis pipeline.

Link

Twitter Crawler

Python module to search and download messages from Twitter.

Link (github)

Listnet

GNU Octave implementation of the Listnet learning-to-rank algorithm.

Link (github)

The Groningen Meaning Bank

A large corpus of semantically annotated English text that anyone can edit.

Link (external)

Wordrobe

A Game With A Purpose to collect linguistic annotation.

Link (external)

Elephant

Word and sentence boundary detection software (a tokenizer, that is), based on supervised statistical methods.

Link (external)