Wikidata:Lexicographical data/How to help

From Wikidata
Jump to navigation Jump to search

There are lots of things you can do to improve lexicographical data on Wikidata!

The suggestions below focus on specific lists and tools, and depending on how well-resourced your language is, the tasks you might want to perform may be a bit different.

First steps, regardless of language[edit]

Check what lexemes already exist in your language[edit]

Before you take up one of the tasks further down this page, you may wish to check that you are not creating lexemes that have been created before. You can do this with Ordia and Hangor (হাঙ্গর).

Check any language-specific documentation for your language[edit]

Some languages have particular ways of modeling their lexemes: you can look through the list of language-specific documentation pages, and if your language is listed there you should read through what has already been described, so that you can be mindful of it when performing one of the tasks further down this page.

(If the documentation subpage for your language seems incomplete, consider adding more information to it! If your language isn't in that list, you might find it helpful to create a language-specific documentation subpage describing how you're modeling lexemes in your language.)

Tasks for all languages[edit]

Add pronunciation audio to existing lexemes[edit]

You can use Lingua Libre to add audio files for the pronunciation of specific lexeme forms!

Find more specific issues to resolve with Wikidata queries[edit]

If you feel comfortable going through a list of query results to fix specific issues, there are lots of maintenance queries that you can adapt for your own language (usually, but not always, by changing a single language item ID within the query)!

Make it easier to add specific types of lexemes[edit]

If a particular type of lexeme in your language (e.g. 'feminine noun', 'imperfective verb', '4th conjugation adjective') is expected to have a particular set of forms, each with a particular set of grammatical features, you can add a template to the Wikidata Lexeme Forms tool for that lexeme type! These templates can then be used to fill in those forms for new lexemes more readily.

Add usage examples to existing lexemes[edit]

If your language has a Wikisource, then you can use Luthor to add examples of lexemes being used within texts on that Wikisource! (You also have the option to add senses to those lexemes using that tool if they do not already have senses.)

See if text can be generated using the lexemes in your language[edit]

First find your language in this list of lexeme counts; if the "# of lexemes w/o senses" is exactly 0, then read below; otherwise, first use Orthohin (অর্থহীন) to add senses in your language and get that number to 0.

In advance of the launch of Abstract Wikipedia, it is necessary not just to have lots of lexemes, but to ensure that they are developed enough that text can meaningfully be generated from abstract content using them. (Read more about Abstract Wikipedia here and here.)

Elemwala (এলেমওয়ালা) is a proof-of-concept interface that allows you to input abstract content and get natural language text in a given output language. There may well be errors with particular inputs, and the text may not be quite as natural as you might expect, but that's where your improvements to your language's lexemes, other Wikidata items, and the tool's source code come in!

Right now only a couple of languages and a limited set of constructions are supported; get in touch with User:Mahir256 if you want to help add your language or a specific construction to that tool!

Tasks for higher-resourced languages[edit]

Add external identifiers to existing lexemes[edit]

Do you speak Ancient Greek, Arabic, Balochi, Cantonese, Danish, Dutch, Egyptian Arabic, English, Finnish, French, German, Greenlandic, Hebrew, Hindustani (specifically Urdu), Japanese, Korean, Middle French, New Persian (either Persian or Dari), Pashto, Polish, Russian, Sanskrit, Sindhi, Slovene, Standard Mandarin, Swedish, Torwali, or Ukrainian?

Then there is at least one external identifier property for lexemes in your language, and a catalog of possible values for that property has been added to Mishramilan (মিশ্রমিলন)! With this tool, you can match catalog entries to existing lexemes in your language, or create entirely new lexemes for those entries!

Add senses to lexemes that currently don't have them[edit]

Do you speak Ancient Greek, Aragonese, Czech, Danish, English, Estonian, French, German, Hebrew, Italian, Latin, Modern Greek, Russian, Spanish, Swedish, or Ukrainian?

Then there are lots of lexemes in your languages that don't have senses yet. You can use Orthohin (অর্থহীন) to add senses one at a time to such lexemes. (If the lexeme you're looking at has external identifiers, perhaps you can use the linked sources to get an idea of what that lexeme means!)

Tasks for lower-resourced languages[edit]

Add lexemes for concepts commonly encountered in linguistics[edit]

If you don't know what lexemes you should create for your language, you can try looking at this list of concept sets in the Concepticon for ideas!

Note, though, that the Concepticon concept list is merely a suggestion list; if your language does not have a word/phrase for a concept in that list, you should not try to forcibly create one!

Add lexemes for concepts from the weekly Lexemes Challenge[edit]

If the Concepticon concept list seems like a lot (or you've already added lexemes for as many concepts in that list as possible!), you can start contributing more slowly by adding lexemes for concepts in User:Envlh's weekly Lexeme Challenge!

(All done with this week's list? Try going through the challenges from prior weeks!)

As with the Concepticon list, however, if your language does not have a word/phrase for a concept in a past or present Lexemes Challenge, you should not try to forcibly create one!