candcapi - HTTP API to access the C&C/Boxer pipeline.

C\&C tools is a suite of software for linguistic analysis of the English language, including a tokenizer, several taggers and a parser. Boxer is a tools for deep semantic analysis that takes in input the output of the C\&C parser. Together, the C&C tools and Boxer form a pipeline toolchain to perform a complete analysis on English text. Here is an example:

$ curl -d 'John loves Mary.' 'http://127.0.0.1:7778/raw/pipeline'
sem(1,[1001:[tok:'John',pos:'NNP',lemma:'John',namex:'I-PER'],1002:[tok:loves,pos:'VBZ',lemma:love,namex:'O'],1003:[tok:'Mary',pos:'NNP',lemma:'Mary',namex:'I-PER'],1004:[tok:'.',pos:'.',lemma:'.',namex:'O']],merge(drs([[]:B,[]:C],[[1003]:named(B,mary,per,0),[1001]:named(C,john,per,0)]),drs([[]:D],[[]:rel(D,B,patient,0),[]:rel(D,C,agent,0),[1002]:pred(D,love,v,0)]))).

The main entry point to the C&C/Boxer API is

$CANDCAPI/$FORMAT/pipeline

$CANDCAPI is the URL of the API installation. $FORMAT is either raw or json, so possible entry point include:

http://my.installation.of.candcapi.net/raw/pipeline http://my.installation.of.candcapi.net/json/pipeline

The text to analyze must be passed as POST to the HTTP request. The command line options for Boxer are passed as URL paramerers. Here are listed:

Here's an example using the option semantics to get a first-order logic formula:

$ curl -d 'Every man loves a woman' 'http://127.0.0.1:7778/raw/pipeline?semantics=fol'
fol(1,not(some(A,and(n1man(A),not(some(B,some(C,and(r1patient(B,C),and(r1agent(B,A),and(v1love(B),n1woman(C))))))))))).

For a more extensive description of the options of Boxer see the official documentation

Output formats

The API can return either raw text or JSON. The raw text version corresponds to the standard output of the http://www.let.rug.nl/basile/papers/BasileBos2011ENLG.pdf pipeline. The JSON version is a simple JSON structure containing both the standard output and the standard error:

{"err": "standard error", "out": "standard output"}

Other URLs

It is possible to access the single tools separately by using the folliowing URLs:

$CANDCAPI/$FORMAT/t
$CANDCAPI/$FORMAT/candc
$CANDCAPI/$FORMAT/boxer

The tokenizer t takes in input a normal text. The parser candc takes in input a tokenized text, i.e. a list of words separated by whitespace. boxer takes in input the Prolog output of the C\&C parser.

For convenience, also the combination of intermediate steps of the pipeline are included in the API:

$CANDCAPI/$FORMAT/tcandc
$CANDCAPI/$FORMAT/candcboxer

respectively, the call the combination tokenizer/parser and parser/Boxer.

To see the version af C\&C/Boxer used by the API:

$CANDCAPI/$FORMAT/version

Graphical output

Discourse Representation Graph is a semantic formalism described in the paper V. Basile, J. Bos (2011): Towards Generating Text from Discourse Representation Structures. The C&C/Boxer API provides an entry point to generate a PNG image of the DRG of a given text:

$CANDCAPI/drg

The URL accepts the same GET parameter as pipeline.