Wikidata:Requests for comment/Can we reuse anything from DBpedia?
An editor has requested the community to provide input on "Can we reuse anything from DBpedia?" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- No significant community participation in several months. This page can still count as a deep analysis of the internal DBpedia structure. Any further discussions should happen in Wikidata:Requests for comment/DBpedia import process. --Ricordisamoa 00:38, 11 February 2014 (UTC)[reply]
Current state: Discussion | Opening date of discussion: 27 August 2013 Closing date of discussion: 3. September 2013 |
It has been suggested that DBpedia has already done a lot of development on converting infoboxes into semantic data and we could profit by studying what they have done and seeing what can be reused.
The Dbpedia ontology is described in their wiki. This wiki describes the various entities used on DB pedia. These include:
Instances[edit]
DBpedia Instances correspond to wikidata Items. DB pedia has over 2,350,000 instances.
Classes[edit]
These are similar to wikidata classes except that they seem to be hard coded into the ontology. They are listed here. By my count there are 578 different classes, nested as shown on that page up to 5 deep.
The arrangement of classes in practice can be different on the two projects. For instance
- 'Administrative Region' on DBpedia is a subclass of Region -> Populated Place -> Place -> Thing.
- 'City' is not a subclass of 'Administrative region'. It is a subclass of Settlement -> Populated Place -> Place -> Thing.
Classes have an important function in defining the constraints on the use of properties.
- Comment It's like in Mnchester Syntax (see rfc I wrote about class and instances). A class is defined in text, and not in the datas, and it can be defined (among other things) by an expression on the value on certain properties and by subclassing, for example the class Woman can be define by is a subclass of <Human beeing> and the value of the property <sex> for the instances is <female>. I think we can do similar things in Wikidata with certain properties applying to classes item that defines the constraints that could be checked by bot, or on contraints on a class item written on its discussion page, as we have constraints on properties now. TomT0m (talk)
Datatypes[edit]
Datatypes are arranged in a "hierarchy" by thing being measured (eg Length, Area). By my count there are 381 datatypes listed on DBpedia. Most of these are what wikidata considers to be dimensions for the Wikidata 'number with dimension' datatype, including a datatype for each currency and numerous datatypes for measurement units.
DBpedia also uses all XSD types, including the following date/time datatypes:
- Xsd:date
- Xsd:dateTime
- Xsd:gDay
- Xsd:gMonth
- Xsd:gMonthDay
- Xsd:gYear
- Xsd:gYearMonth
- Xsd:time
Note that where DBpedia reuses a dataype or property from another ontology it adds a prefix to identify where it has come from.
DBpedia datatypes seem to work very differently from those on Wikidata. I doubt it is practical to reuse any of their work here on Wikidata.
Properties[edit]
By my count DBpedia has 2547 properties.
For each property is defined the following:
- rdfs:label@en - the label of the property in English (some properties have labels in other languages as well)
- rdfs:comment@en - a description of the property in English.
- rdfs:domain - the class of objects (= wikidata items) that can have this property
- rdfs:range - The class of objects (=wikidata items) or the datatype that this property can link to
- rdf:type - Not sure. Doesn't seem to be used much.
- rdfs:subPropertyOf - Not sure. Doesn't seem to be used much.
- owl:equivalentProperty - Not sure. Doesn't seem to be used much.
- owl:propertyDisjointWith - Not sure. Doesn't seem to be used much.
As far as I can see these are largely modelled around the Wikimedia infoboxes. As the infoboxes are oriented towards presentation rather than data representation this leads to some oddities such as about a hundred properties named 'numberoffoo' rather than counting the actual number of foos - as wikidata seems to prefer to do.
Note that DBpedia does not seem to have qualifiers which presumably means it needs more properties.
Mappings[edit]
A mapping is a link from a 'template property' in a wikipedia template (mostly infoboxes but some other templates too) to the corresponding DBpedia property. By my count DBpedia are working on 440 Templates from the English wikipedia plus many more on other language wikipedias.
Comments[edit]
I am sure that DBPedia has many properties that will also be relevant to Wikidata. On the other hand, since DBPedia has a different goal (extraction from existing templates) and a different datamodel (no references, no qualifiers, different data types), it is also clear that many properties would rather be done differently in Wikidata. So in the end one always needs to look at every single property to decide if and how this would fit into Wikidata. I think one just cannot make such a decision in general for some 2500 properties across all domains without consulting with the experts in each domain individually. (I still think it is important to record this discussion somewhere on this wiki, so thanks for launching this RfC) --Markus Krötzsch (talk) 17:39, 20 September 2013 (UTC)[reply]