NLL2RDF Dataset

NLL2RDF dataset is composed by 37 licenses, comprising all the licenses adopted to license data in the Linked Data cloud (as all the Creative Commons licenses20 ), software licenses (as Mozilla Public License21 and Microsoft License22 ), and additional licenses for other material on the Web (as UK Open Government license, and New Free Documentation License).
As a second step, we manually “translated” the textual version of each license into RDF, adopting the following vocabularies: CC REL for Creative Commons and ODRL for all the other licenses).
Given for instance a textual fragment of the ODC Open Database License (ODbL):
“You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the database. To Adapt: To modify, transform and build upon the database. As long as you: Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database. Share-Alike: If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL. [...]”

we manually built the machine readable version of the license as follows:


@prefix odrl: http://www.w3.org/ns/odrl/2/.
@prefix : http://example/licenses.
:licODBL a odrl:Set;
odrl:permission [
a odrl:Permission;
odrl:action odrl:derive;
odrl:action odrl:share
] ;
odrl:duty [
a odrl:Duty;
odrl:action odrl:attribute;
odrl:action odrl:shareAlike
] .

We use this machine readable version of the licenses as a goldstandard, i.e. to be compared with NLL2RDF’s output in order to evaluate its ability in generating a correct RDF from the licenses texts.
As a third step in the creation of the reference dataset, we annotate in the textual version of the license the sentences containing the lexicalization of the ontological relations (i.e., the sentences whose meaning correspond to the ontological relations), to train our system. For instance, in the example of the ODbL license above, we annotate the sentence “You are free: To Share the database” with the ODRL relation odrl:Permission and the value odrl:share; the sentence “You are free: To produce works from the database” with the ODRL relation odrl:Permission and the value odrl:derive; the sentence “As long as you: Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL” with the ODRL relation odrl:Duty and the value odrl:attribute; and the sentence “As long as you: Share-Alike: If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL “with the ODRL relation odrl:Duty and the value odrl:shareAlike.
The same annotation task has been carried out on Creative Common licenses adopting CC REL ontology.

For the dataset annotation we adopted the CONLL IOB format, usually used in the NLP community for Natural Language Learning shared tasks. We first tokenized the sentences using Stanford Parser, and we then added two columns, the first one for the annotation of the relation, and the second one for the value, as follows :


#id-004
1 You PRP B-PERMISSION DERIVE
2 are VBP I-PERMISSION
3 free JJ I-PERMISSION
4 : : O
[...] O
5 To TO I-PERMISSION
6 produce VB I-PERMISSION
6 works VBZ I-PERMISSION
7 from IN I-PERMISSION
8 the DT I-PERMISSION
15 database NN I-PERMISSION
16 . . O

The dataset has been annotated and independently verified by two annotators, with a complete agreement on the annotations (as introduced before, at this stage NLL2RDF considers licenses’ basic deontic components only, for which human agreement is complete on almost all of them).

Download dataset

Authors:

  • Elena Cabrio and Serena Villata (INRIA Sophia Antipolis, France)
  • Alessio Palmero Aprosio (Machine Linking Srl, Italy)