Advances in computing technology from the past few decades, including general communications technology like the world wide web as well as specific advances in computational linguistics, have opened the possibility of a cyberinfrastructure for linguistics that will advance the field by allowing linguists to analyze and test hypotheses against much larger data sets, collaborate with more people across greater distances, and as a result ask questions not previously answerable.
We envision a research climate in which data including audio and video recordings, transcriptions, interlinear glossed text, dictionaries, acceptability judgments, typological classifications, psycholinguistic results, language acquisition data others are available for virtually all the worlds languages through web-based portals. These benefits are made possible by data that is encoded in standardized formats, and annotated with standardized metadata so that they can be discoverable, searchable, and aggregatable to maximize the ability of researchers to find both particular data sets and examples of interest as well as test hypotheses against large quantities of data.
However, in order to realize the promise of a cyberinfrastructure, the field needs to solve three problems: (1) the culture change problem, (2) the design problem, and (3) the funding problem. Regarding (1), we need to establish a culture in the field of publishing and sharing data and annotations, and of expecting hypotheses to be tested against available data sets. Regarding (2), we need to identify existing standards and software that can contribute to a general cyberinfrastructure and plan how to build from them. Finally, regarding (3), we need to develop a funding model which will sustain not only research contributions by linguists and computational linguists, but also software development (including user interface work) by software engineers.
These problems can not be solved by isolated research projects, but rather require wide-spread communication, participation and buy-in from the field. In July 2009, the Cyberling 2009 Workshop (held in conjunction with the LSA Linguistic Institute at UC Berkeley) brought together researchers from diverse subfields of linguistics (as well as some non-linguists) interested in issues of Cyberinfrastructure. The results of those conversations are documented in the workshop wiki. The workshop was a wonderful opportunity to discuss issues pertaining to cyberinfrastructure for linguistics across many different perspectives, and we would like to continue that conversation and collaboration online, without waiting for the next opportunity to meet face-to-face. The goal of this blog is to provide a site for that on-line collaboration. We hope the "breaking news" aspect of a blog will bring people back to the site regularly to see updates and participate in the discussion, while tags on the posts will support organization of the information so that it also becomes a useful repository.
The Cyberling 2009 Workshop and the initial development of this blog were funded by the National Science Foundation under grant number BCS-0936577. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.