We recently became fans of an annotation methodology called Best-worst scaling developed by Jordan Louviere (University of Alberta) in 1987, and its application to linguistic annotation as proposed by Svetlana Kiritchenko and Saif M. Mohammad at the National Research Council Canada. In a nutshell, the method involves asking the annotators to select the most and least relevant items off a tuple of instances with respect to the phonomenon under study. This is in contrast to the usual annotation procedures, where items are typically presented to the annotator one by one, and this variant has shown to lead to an easier annotation procedure, which in turn yields more reliable results.
An affective lexicon is a database of words (or word senses, phrases, or other kinds of lexical items) where each item is classified according to its content in terms of subjectivity, polarity (potisive or negative), capability of evoking specific emotions and so on. Such resources are used to build automatic systems that analyze natural language (for example, from websites or social media), and “read” the sentiment expressed in the text. This activity is called Sentiment Analysis (or Opinion Mining) and it is gaining more and more attention from the scientific communities as well as industry, because it can answer questions like “are customers happy with product X?” or “what type of people approve policy Y?”.
Every time someone at a conference asks me this question, I always have troubles coming up with a simple, concise answer. In academia, the research interests of the members of one team are hardly the same. Sometimes, there is no overlap at all between the interests of members of the same team. Yet, their interests must be related somehow, otherwise they wouldn’t be in the same team to begin with.
Recently I set up an experiment to create a dataset of human judgements about objects and places. The subjects are asked to rate the likelihood of an object to be found in a place (both from Wikipedia) from unusual to usual. The goal is to create a gold standard against which we can evaluate our AI algorithms.
Back in December, I (successfully) defended my PhD at the University of Groningen (NL). It’s a special ceremony there, the curious reader can take a look here.
Periodically, I find myself dealing with the problem of publishing my list of publications online. There are many services that crawl websites and databases to build per-author bibliography (Google Scholar is a good example), but I have a couple of issues with them. For one, they work automatically thus requiring a variable extent of manual work integration and polishing of the data. My main problem, however, is having to (or wanting to) publish the list of publications on my website in a format I have control upon. I’m ok with having to manage the list of my publications manually, but only if it’s in a centralized way.
I recently cleaned and packed up a series of scripts that were laying around as a result of several project involving Twitter data. This software downloads potentially massive amounts of tweets based on either a list of keywords or a list of Twitter usernames.
My adventure in Groningen (NL) is coming to an end and I will soon need a new place, both physically and on the Web.