The paper “Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia” has been accepted to the 1st workshop on NLP & DBpedia (co-located with ISWC). We will present it in Sydney in October 2013.
DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles.
Due to the increasing number of resources linked to it, DBpedia plays a central role in the Linked Open Data community.
Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of the Wikipedia page.
These infoboxes are manually compiled by the Wikipedia contributors, and in more than 50\% of the Wikipedia articles the infobox is missing.
In this article, we use the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia.
We evaluate our system on a data set consisting of seven DBpedia properties, demonstrating the suitability of the approach in extending the DBpedia coverage.