Works

Here are some of the things I did, or I was involved in somehow, and software I maintain.

TWITA

The collection of tweets from Twitter in Italian language.

Link

C&C API

HTTP RESTful API to analyze English natural language using the C&C tools and Boxer.

Link

Delicious Folksonomy Dataset

A dataset obtained crawling Delicious, the social bookmarking website.

Link

C&C/Boxer Web Interface

Web interface for the C&C/Boxer linguistic analysis pipeline.

Link

Twitter Crawler

Python module to search and download messages from Twitter.

Link (github)

Listnet

GNU Octave implementation of the Listnet learning-to-rank algorithm.

Link (github)

The Groningen Meaning Bank

A large corpus of semantically annotated English text that anyone can edit.

Link (external)

Wordrobe

A Game With A Purpose to collect linguistic annotation.

Link (external)

Elephant

Word and sentence boundary detection software (a tokenizer, that is), based on supervised statistical methods.

Link (external)