Review of Glottolog 2.0

Back in February of 2012, Sebastian Nordhoff (MPI-EVA) announced on Cyberling the launch of Glottolog/Langdoc, a comprehensive database of bibliographic data about the world’s languages.[1] Unfortunately, given the ephemeral nature of Web resources, links in this announcement, such as "Give me all works about Zulu" now return 404 error messages. However, these broken links are due to the release of the new Glottolog 2.0 and they are sure to be fixed, if they haven’t been already.[2]

The main inviting feature of Glottolog 2.0 is its new interface. Based on the increasingly popular website framework Bootstrap[3] and implemented by Robert Forkel (developer of WALS online), Glottolog’s frontend is clean and very easy to navigate. The site offers two entry points to its contents, which are based largely on the aggregation of documentation and genealogical hypotheses from Harald Hammarström.

The first is the "Languoid Catalog", a comprehensive index on the world’s languages, dialects and language families.[4] It is easily searched by language name, ISO 639-3 code or country. The search results provide an interactive map with language points and a listing of names, families and links to bibliographic references in the "Langdoc" portion of Glottolog.

In Langdoc there are nearly 200k (non-unique) references, which are searchable, sortable and downloadable.[5] Amongst the references, there are some mistakes, e.g. Roger Bleach instead of Roger Blench,[6] and some, probably many, duplicates.[7] Regarding mistakes, it is easy to report them and corrections are listed on the errata page. Regarding duplicates, one could argue this is a feature and not a bug. Personally I would rather encounter duplicate hits than no hits at all, seeing as there are checks in place to catch duplicates without getting false positives. The Glottolog editors are also happy to receive additional bibliographies, in any format, from the linguistics community, so we can expect the references to continue to increase in number.[8]

In the future, I think we can expect even more from the editors and developers of Glottolog,[9] as they continue to add references and genealogical information on the world’s languages. It would be really great to see some synergy between, say, the contents available in the Glottolog and the contents of the Open Language Archives Community.[10] Both communities are leveraging technologies towards "cyberinfrastructure" and both are fine tools for finding information on lesser-known languages.

[6] The languages of the Tasmanians and their relation to the peopling of Australia.
[7] For example, I’m listed twice with different surname orders for the same reference.
[8] I recently sent them a dump of all references to resources included in my own phonological typology database, hoping that the time I spent collecting these citations makes it easier for others to find obscure records on lesser-studied languages. Extracting bibliographic references is made simple. A useful tool for anyone developing a typological database.


Dead links to glottolog

Transitioning to Glottolog 2.0 we had to make a couple of important decisions. One of these was to reduce the scope of data available on Glottolog to a set that we have a chance to actually maintain.

In terms of classifications this means we stick with the one curated by Harald Hammarström and do no longer store alternative classifications as e.g. from multitree. The rationale behind this cut was this: Over time there would have been a split between languoids which get updated, linked to bibliogrphical records, displayed for searches and the orphaned languoids, that no one curates. From our experience this implies confusion and possibly frustration, because there would be one part of the site that encourages feedback and actually incorporates it in new revisions and one part where this is not going to be the case.

Anyway, I admit "404 Not found" is probably not the best way to signal these decisions to our users - although it is a good way to signal it to search engines. So we now switched to responding with "410 Gone" to requests for languoid pages we no longer have and hope the explanation given here might spread :)

Glottolog and OLAC

As of August 23 Glottolog is a registered archive at OLAC. All Glottolog languoid pages will become available as OLAC achive records.

It is still not clear to me, whether this is the kind of integration linguists want. It would also be possible to make all bibliographical records from Glottolog available on OLAC, but this would probably make OLAC less usable, because adding 200k records could marginalize the smaller archives.

So I think the workflow for "getting all information about language x" would be to look the language up on OLAC, then surf to the language's page on Glottolog to look up bibliographical references.

Powered by Drupal, an open source content management system