Stable identifiers for genebank accessions still a dream?

by Luigi Guarino on June 17, 2013

Natural history collections and herbaria contain many millions of specimens that are used for research. When scientists publish their results they cite which specimens they used so that other scientists can both check the work and build on what has been achieved.

Institutions that hold specimens are publishing increasing amounts of data about (and images of) their specimens on-line. We need to have a way for scientists to reference specimens so that someone reading research results can simply click a link to see the supporting data and perhaps an image. To make this happen we need stable web links to the specimens that the holding institutions commit to maintain for the long term and that are implemented in a similar way across many institutions. Once this mechanism is widely adopted machines will be able to exploit the links to specimens to help do entirely new kinds of research.

This meeting was about establishing a consistent mechanism that will work across institutions.

Yes! And genebank accessions? When can we have some movement on that?

{ 4 comments… read them below or add one }

Dag Endresen June 17, 2013 at 6:22 pm

Full support for making the dream of stable and persistent identifiers come true — now. Stable URLs created from the holding institute web-domain and the catalog number (accession number) will do all fine to get this new paradigm started. My personal preference would be to create a globally unique identifier for specimens (such as genebank accessions) as a UUID and to think of the prepend http://… part as the URL to the resolver (persistent identifier = resolver-URL + UUID). And further to design the URL to the resolver so that the resolution service URL might be possible to redirect in the future (e.g. using PURL) to a global service such as e.g. GBIF (or perhaps GeneSys). BUT stable identifier URLs now (based on genebank web-domains and catalog numbers) are far more valuable than an eternal discussion on how the identifiers should be designed. So all thumbs up!

The pro-iBiosphere wiki has an interesting discussion on stable identifier URI patterns.

Reply

Ruaraidh June 19, 2013 at 8:21 am

Stable identifiers for accessions? Beware – these are living collections, not like herbarium specimens. Genebank managers do their best to stop evolution, but can’t. And when samples get into the hands of users who don’t follow genebank standards, change is rapid. You can be sure that your copy of my accession is not the same as mine.

The key is to document provenance of your sample. Multi-Crop Passport Descriptors say identify your sample, say where you got it from, and say where it originally came from.

Reply

Luigi Guarino June 19, 2013 at 11:55 am

Why can’t all samples get stable identifiers?

Reply

Dag Endresen June 19, 2013 at 2:57 pm

Simple accession numbers (catalog numbers) are good for talking about accessions with collaborators that are aware of the context. If you want to talk about accessions on the Internet, you find yourself in a much larger context where the identity of most accessions identified by catalog numbers will be ambiguous to somebody. Stable identifiers (persistent identifiers) are to enable people to talk about accessions in larger contexts such as the Internet. As Ruaraidh mentioned, someone starting to use the same identifier string for something that is not/no longer the same accession will cause problems in regards to the true identity of an accession identified by that persistent identifier. But surely only using the locally unique catalog numbers will cause more ambiguity about the true identity of the accession…?

Stability of germplasm accessions and the persistence of the identifiers used to identify accessions are different and both very valid challenges.

Reply

Leave a Comment