data provenance

Crowdsourcing WALS using Linked Data

The World Atlas of Language Structures project ( is one of the landmarks of digital linguistics. It contains 192 features in 2678 languages. However, the resulting data matrix is very sparse, and instead of the possible 514176 datapoints, there are only about 68000, or 13%.

Announcing Glottolog/Langdoc, a knowledge base of 175k references for (mostly) underdescribed languages

We are happy to announce Glottolog/Langdoc, a comprehensive knowledge base of 104k languoids and 175k references for the Semantic Web.

In linguistics as well as in the Semantic Web world, it is important to clearly identify the concepts one is talking about. Glottolog/Langdoc takes this insight as a starting point and provides 104k Unique Resource Identifiers (URIs) for languoids and 175k for references to descriptive literature focusing on underdescribed languages.

NSF announces Building Community and Capacity for Data-Intensive Research in the SBE Sciences

The NSF Directorates for Social, Behavioral & Economic Sciences (SBE) and Education & Human Resources (EHR), together with the Office of Cyberinfrastructure (OCI) recently announced a solicitation for Building Community and Capacity for Data-Intensive Research ( with a proposal deadline of 2012-05-22. Here are some snippets from the solicitation.

Liberman on Open Access and the three-legged stool

This post over at Language Log is highly recommended. A quick excerpt:

“reproducible research” [...] requires three things: (1) the data sets that serve as input; (2) the programs needed to run the experiment; and (3) a comprehensible account of what the experiment does, why it matters, and what the results are.

This is from Mark Liberman's abstract for his talk at the Berlin 9 Open Access Conference taking place in Maryland (not Berlin).

Conference on Science and the Internet 2012

From the call for papers:

Online media have brought about numerous changes in scholarly practices, including, but not limited to gathering data, finding relevant literature, making research and results accessible, organising collaboration, communicating with colleagues and students as well as creating fruitful learning environments.

Adapting a Scientific Workflow Infrastructure to Linguistics

In Linguistics (and similar social sciences), there are no standard 'workflow workbenches' that can be used for non-programmers to develop, use, and share their workflows. However, as an increasingly data-intensive science, computational linguists are using computational pipelines in their research, in order to facilitate their main work.

Changing the Conduct of Science in the Information Age

The National Science Foundation has posted a workshop report entitled Changing the Conduct of Science in the Information Age. While it doesn't appear to contain direct input from linguists, many of the issues it discusses will be familiar to those interested in promoting a cyberlinguistics infrastructure.

From the executive summary:

"Linked Data in Linguistics" at DGfS 2012

Linked Data in Linguistics
Linguists from all disciplines produce more and more data and share the challenge how to make this data accessible to other researchers in their field and beyond. This does not only concern the general availability of data, but also the representation of the structure of the data. Linked Data is one paradigm which can be employed to tackle this task.
We are happy to announce the workshop "Linked Data in Linguistics" at the annual meeting of the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS) taking place March 7-9, 2012 in Frankfurt a.M., Germany.

Syndicate content
Powered by Drupal, an open source content management system