Announcing Glottolog/Langdoc, a knowledge base of 175k references for (mostly) underdescribed languages

We are happy to announce Glottolog/Langdoc, a comprehensive knowledge base of 104k languoids and 175k references for the Semantic Web.

In linguistics as well as in the Semantic Web world, it is important to clearly identify the concepts one is talking about. Glottolog/Langdoc takes this insight as a starting point and provides 104k Unique Resource Identifiers (URIs) for languoids and 175k for references to descriptive literature focusing on underdescribed languages.

Glottolog/Langdoc provides URIs for languages, dialects and families (languoids). Danish for instance has the URI; Dani from New Guinea has

Languoids are annotated with their names, codes, macroareas, geocoordinates, genealogical affiliation, and, most importantly, their references. Links are provided to sites which provide further information ( Ethnologue, ISO 639-3, WALS, Multitree, OLAC, LL-Map, ODIN, and Wikipedia.)

References have URIs of the type and contain bibliographical information such as author, year and title, but also information about the document type (grammar, dictionary, ...) and the object and meta languages. For some references, we have an index of words as well. References can be downloaded as Bibtex, HTML, txt, or with Zotero.

The knowledge base can be queried in a number of ways

Furthermore, an areally and genalogically balanced sample can be drawn.

One core principle of Glottolog/Langdoc is that two authors using the same name do not necessarily refer to the same thing. This means that 'Dravidian' as seen by Dryer and 'Dravidian' as seen by Zvelebil are seen as different entities, which have their distinct IDs, drav1245 and drav1246, respectively.

This separation leads to conceptual clarity. Obviously, often the differences are negligible. Where those cases could be identified, they are linked via SKOS:closematch.

This brings us to the integration of Glottolog/Langdoc in the Semantic Web world: All URIs support content negotiation and can be viewed as XHTML or RDF. Dumps of references are available as a very large *bib and as a dump in rdf+xml.
Glottolog/Langdoc content is made available under CC-BY-NC. Intercultural issues upstream unfortunately prevent us from releasing the content under a more permissive license.

Glottolog/Langdoc is part of the Linguistic Linked Open Data cloud and uses DCMI, BIBO, FRBR, and ISBD ontologies to provide an interoperable resource. Some concepts not contained elsewhere are provided in Glottolog ontology. We are working on a SPARQL endpoint, which will probably be made available in June this year.

Comments are welcome and can be sent to


