LSA Data Sharing Resolution

Submitted by jcgood on Mon, 01/11/2010 - 08:21

At the recently concluded Annual Meeting of the Linguistic Society of America (LSA) in Baltimore, the following resolution on Data Sharing was passed by those at the Business Meeting. It will soon be sent along to the whole membership of the Society for their vote. The resolution was put forth by the LSA's Technology Advisory Committee.

--------------------------------------------
Whereas modern computing technology has the potential of advancing linguistic science by enabling linguists to work with datasets at a scale previously unimaginable; and

Whereas this will only be possible if such data are made available and standards ensuring interoperability are followed; and

Whereas data collected, curated, and annotated by linguists forms the empirical base of our field; and

Whereas working with linguistic data requires computational tools supporting analysis and collaboration in the field, including standards, analysis tools, and portals that bring together linguistic data and tools to analyze them,

Therefore, be it resolved at the annual business meeting on 8 January 2010 that the Linguistic Society of America encourages members and other working linguists to:

make the full data sets behind publications available, subject to all relevant ethical and legal concerns;

annotate data and provide metadata according to current standards and best practices;

seek wherever possible institutional review board human subjects approval that allows full recordings and transcripts to be made available for other research;

contribute to the development of computational tools which support the analysis of linguistic data;

work towards assigning academic credit for the creation and maintenance of linguistic databases and computational tools; and

when serving as reviewers, expect full data sets to be published (again subject to legal and ethical considerations) and expect claims to be tested against relevant publicly available datasets.

--------------------------------------------

The resolution passed in the Business Meeting by a comfortable enough margin that no vote count was required. Some members of the Society expressed reservations about the resolution including: (i) the logically separate points it brings together, (ii) the overall framing of the resolution towards users of data rather than producers of data, and (iii) the relatively limited mention of "ethical" issues.

My own sense is that some of these points could be addressed in revisions to the resolution that would probably be acceptable to both its original authors and those with objections. However, the LSA's somewhat antiquated resolution process does not make it easy to make such revisions on anything less than a full-year cycle. So, for now the above resolution is the one that will move forward to the membership.

After the resolution was presented at the Business Meeting, the LSA Ethics Committee decided it would discuss the resolution on its Ethics Discussion Blog in the near future, specifically to address what ethical issues it raises.

Comments

"full data sets"

Submitted by jcgood on Tue, 01/12/2010 - 14:46.

I realized I forgot to mention one other comment/criticism that came up when the resolution was presented: People wondered what "full" data sets were. Someone feared it could mean, for example, all of someone's field data even if they only published one article. I don't think that's what was intended, but it's not as clear as it could be.

More on full data sets

Submitted by ebender on Fri, 01/15/2010 - 22:17.

Another suggestion, for field linguists with this hesitation, would be to deposit the full data set in an archive at or near the time of publication, while retaining the ability to restrict who has access to the data (initially only the field researcher), and of course to upload further revisions to the data later. That way, the data is ready for sharing (already in the archive), when the linguist is ready.

"full data sets"

Submitted by ebender on Wed, 01/13/2010 - 00:22.

The intention there was to encourage people to publish the data they are generalizing over for each article, not every piece of data they have collected for the language. It would be nice if we had a chance to clarify such things in the resolution language.

LSA resolutions

Submitted by ebender on Tue, 01/12/2010 - 13:39.

It really is unfortunate that the resolution process doesn't have more flexibility for incorporating feedback, but nonetheless it is exciting to see this resolution going forward! I look forward to the discussion I hope it will engender among the LSA membership.