Wikidata:Requests for comment/Harvesting of Wikipedia Infoboxes for Wikidata. Proposal for extension of Harvest Templates

From Wikidata
Jump to navigation Jump to search
An editor has requested the community to provide input on "Harvesting of Wikipedia Infoboxes for Wikidata. Proposal for extension of Harvest Templates" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.

If you have an opinion regarding this issue, feel free to comment below. Thank you!

Up to now, we did a lot of technical work in the GlobalFactSync Project including testing the infobox parsers, extracting the references, matching the references to the right parameter value, analysing the frequency of references to find the most frequently used sources, etc.

The result is still quite technical and there are many generic APIs. In order to make it more usable we brainstormed on some features and improvements that we could push into Harvest Templates. We marked the possible features in the mockup. Some are for usability and some provide more and better data (functionality). Feedback would be helpful to focus on the most wanted first. Or maybe we didn't think of other cool possibilities.

Team internal Discussion results: other potential features for harvest template:[edit]

  • option to select multiple (wikipedia) sources which are considered for import (not only limit to one source)
  • if multiple conflicting candidate values from different sources occur only select per default the ones with high popularity link
  • write log file of posted request and suggestion vs selected values to be able to analyze late
  • open popup/link for candidate statements with information from GFS browser
  • guided harvest template. “google” like search bar with autocompletion for template -> after selected template show the parameters available and how good they are mapped in Wikidata

Feature 4 & 5[edit]

  • use direct wikilinks to get infobox names from other languages or from dbpedia mappings
  • advanced multilingual properties suggestions??


Top Section with Feature Markup Bottom part