curation

Adapting a Scientific Workflow Infrastructure to Linguistics

Submitted by Richard Littauer on Thu, 09/29/2011 - 11:10

In Linguistics (and similar social sciences), there are no standard 'workflow workbenches' that can be used for non-programmers to develop, use, and share their workflows. However, as an increasingly data-intensive science, computational linguists are using computational pipelines in their research, in order to facilitate their main work.

Changing the Conduct of Science in the Information Age

Submitted by jcgood on Tue, 08/30/2011 - 15:26

The National Science Foundation has posted a workshop report entitled Changing the Conduct of Science in the Information Age. While it doesn't appear to contain direct input from linguists, many of the issues it discusses will be familiar to those interested in promoting a cyberlinguistics infrastructure.

From the executive summary:

Access to lexical databases: discussion

Submitted by ebender on Tue, 05/17/2011 - 20:10

Claire Bowern has started a discussion on her blog, Anggarrgoon, about access to aggregated lexical data: how to protect the rights of the various stake holders while encouraging as much sharing as possible. I enjoyed her tongue-in-cheek suggestion that linguist-contributors should, in game-theoretic fashion, get access to data in proportion to the data they share.

Data provenance and data aggregation

Submitted by jcgood on Mon, 04/25/2011 - 18:34

Peter Austin, over at Endangered Languages and Cultures, has initiated a discussion on citation practices (with James McElvenny also participating), and it was prompted (at least partly) by some data I have had a role in processing as part of the LEGO project.

LRTS Sharing Workshop at IJCNLP 2011

Submitted by ebender on Thu, 03/10/2011 - 13:10

FLaReNet, Language Grid and META-SHARE are co-hosting the Workshop on Language Resources, Technology and Services in the Sharing Paradigm at IJCNLP 2011. From the call for papers:

The Workshop aims at addressing (some of the) technological, market and policy challenges posed by the “sharing and openness paradigm”, the major role that language resources can play and the consequences of this paradigm on language resources themselves.

Beyond the PDF?

Submitted by jcgood on Mon, 01/24/2011 - 16:51

While looking for something on this blog http://cameronneylon.net/category/blog/ (which I recommend in general), I stumbled on the fact that an interesting workshop recently took place entitled Beyond the PDF. The workshop goal is described as follows:

Open Data and corpora for (computational) linguistic research

Submitted by ebender on Tue, 01/18/2011 - 15:52

I recommend this guest post by Nancy Ide over on the Open Knowledge Foundation Blog. Ide gives a brief history of the ANC, and describes issues pertaining to creative commons licensing and copyright that arise when textual data are repurposed for linguistic and computational linguistic research.