Source code released

The Java source code used to create the datasets and the Esperanto DBpedia is finally available on BitBucket. The project is called airpedia (wow!).

The documentation for the whole package will be soon available on the wiki.

Esperanto DBpedia released

We are pleased to share with you this important information: the first automatically-created edition of DBpedia is out!

Esperanto is a constructed language spoken by about 2M people in the world. The Esperanto Wikipedia is – with its 200K+ articles – one of the biggest Wikipedia for which the DBpedia corresponding chapter has not been created yet. It is ranked 34th in the list of Wikipedias sorted by the number of articles, and it is constantly updated by a community of almost 100K users.

Go to the website

Paper accepted at ESWC 2014

The paper “These are your rights: A Natural Language Processing Approach to Automated RDF Licenses Generation” has been accepted to the research track of ESWC 2014. We will present it in Anissaras (Crete, Greece) in May 2014.

Abstract

In the latest years, the Web has seen an increasing interest in legal issues, concerning the use and re-use of online published material. In particular, several open issues affect the terms and conditions under which the data published on the Web is released to the users, and the users rights over such data. Though the number of licensed material on the Web is considerably increasing, the problem of generating machine readable licenses information is still unsolved. In this paper, we propose to adopt Natural Language Processing techniques to extract in an auto- mated way the rights and conditions granted by a license, and we return the license in a machine readable format using RDF and adopting two well known vocabularies to model licenses. Experiments over a set of widely adopted licenses show the feasibility of the proposed approach.

Paper accepted at NLP & DBpedia workshop (ISWC)

The paper “Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia” has been accepted to the 1st workshop on NLP & DBpedia (co-located with ISWC). We will present it in Sydney in October 2013.

Abstract

DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles.
Due to the increasing number of resources linked to it, DBpedia plays a central role in the Linked Open Data community.
Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of the Wikipedia page.
These infoboxes are manually compiled by the Wikipedia contributors, and in more than 50\% of the Wikipedia articles the infobox is missing.
In this article, we use the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia.
We evaluate our system on a data set consisting of seven DBpedia properties, demonstrating the suitability of the approach in extending the DBpedia coverage.

Paper accepted at ISWC conference

The paper “Towards an automatic creation of localized versions of DBpedia” has been accepted to the research track of ISWC 2013. We will present it in Sydney in October 2013.

Abstract

DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology.
Thanks to crowd sourcing, a large number of infoboxes has been mapped in the English DBpedia.
Consequently, the same procedure has been applied to create the localized versions of DBpedia.
However, the number of accomplished mappings is still small and limited to most frequent infoboxes.
Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles.
In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version.
The evaluation has been performed on the Italian mappings, we compared our results with the current mappings on a random sample re-annotated by the authors.
We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed.
Specifically, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not available yet, the resource is made available in an open format.

New version of Italian DBpedia includes Airpedia dataset

As of today, the new version 3.2 of the Italian DBpedia is online and browseable from it.dbpedia.org.

The rdf:type information of this release are enriched with types from Airpedia; that permitted resources with unmapped templates to show an estimated type, derived with a Machine Learning algorithm; those resources are marked with proprerty http://airpedia.org/ontology/is_estimated_type (set to “true” or unavailable). Moreover we will have also confidence information triples like http://airpedia.org/ontology/type_with_conf#X, where conf# is in range from 6 to 10 (10 meaning that other international DBpedias have a similar page with manually mapped type).

In addition, the Italian DBpedia now includes owl:sameAs triples to the Wikidata pages, the collaborative database from MediaWiki, also provided by Airpedia.

New resource available: entity types in 31 languages

Following the work already presented at ESWC conference, we enhance the coverage of DBpedia over pages devoid of infobox. The resource contains 10M computed entity types. It is available in RDF format and can be downloaded here. This new version of the dataset contains articles extracted from Wikipedia chapters in 31 languages: Albanian, Belarusian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hungarian, Icelandic, Indonesian, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian.

Paper accepted at I-KNOW conference

The paper “Automatic mapping of Wikipedia templates for fast deployment of localized DBpedia datasets” has been accepted at the I-KNOW Conference. Therefore we will present our work in September 2013 in Graz (Austria).

Abstract

DBpedia is a Semantic Web resource that aims at representing Wikipedia in RDF triples. Due to the large and growing number of resources linked to it, DBpedia has become central for the Semantic Web community. The English version currently covers around 1.7M Wikipedia pages. However, the English Wikipedia contains almost 4M pages. This means that there is a substantial problem of coverage (even bigger in other languages). The coverage slowly increases thanks to the manual effort made by various local communities. This effort is aimed at manually mapping Wikipedia templates into DBpedia ontology classes and then run the open-source software provided by the DBpedia community to extract the triples. In this paper, we present a resource obtained by automatically mapping templates in 25 languages. We also describe the approach used, starting from the existing mappings on other languages and extending them using the cross-lingual information available in Wikipedia. We evaluate our system on the mappings of a set of languages already included in DBpedia (but not used during the training phase), demonstrating that our approach can replicate the human mappings with high precision and recall, and producing an additional set of mappings not included in the original DBpedia.

Paper under revision at ISWC conference

We just submitted a paper titled Towards an automatic creation of localized versions of DBpedia to the ISWC conference. In this work, we implement an automatic system

Abstract

DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowd sourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to create the localized versions of DBpedia. How- ever, the number of accomplished mappings is still small and limited to most fre- quent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontol- ogy for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings, we compared our results with the current mappings on a random sample re-annotated by the authors. We report re- sults comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed. Specif- ically, we mapped Wikipedia 45,978 infobox attributes to DBpedia properties in 14 different languages for which mappings were not available yet, the resource is made available in an open format.

Paper presented to I-KNOW conference

Continuing the work on enrich DBpedia, we propose a fast method to automatically map templates to the DBpedia Ontology classes. The dataset is available on this page.

Abstract

DBpedia is a Semantic Web resource that aims at representing Wikipedia in RDF triples. Due to the large and growing number of resources linked to it, DBpedia has become central for the Semantic Web community. The English version currently covers around 1.7M Wikipedia pages. However, the English Wikipedia contains almost 4M pages. This means that there is a substantial problem of coverage (even bigger in other languages). The coverage slowly increases thanks to the manual effort made by various local communities. This effort is aimed at manually mapping Wikipedia templates into DBpedia ontology classes and then run the open-source software provided by the DBpedia community to extract the triples. In this paper, we present a resource obtained by automatically mapping templates in 25 languages. We also describe the approach used, starting from the existing mappings on other languages and extending them using the cross-lingual information available in Wikipedia. We evaluate our system on the mappings of a set of languages already included in DBpedia (but not used during the training phase), demonstrating that our approach can replicate the human mappings with high precision and recall, and producing an additional set of mappings not included in the original DBpedia.