Following Brassica into Genebank Database Hell

Scientists at The Genome Analysis Centre (TGAC) have released the first web repository for Brassica (mustard plants) trait data to tackle reproducibility, user controlled data sharing and analysis worldwide. Scoring the versatile crop’s beneficial traits will assist Brassica breeders in improving their crop yields, increased nutritional benefits and reduce our carbon footprint through biofuel production.

Very worthy, of course. But also, alas, an opportunity missed. How so? Come with me to Genebank Database Hell.

Let’s start with a random germplasm line from the Brassica portal: DEU146_BRA_02028. That’s a weird but somewhat familiar name. People in the know will recognize DEU146 as the code for the German national genebank, IPK. But the organization is given in the portal as CGN, the Dutch national genebank. What’s going on? Stay with me, don’t panic. The portal does provide the following metadata for the material in question:

Provenance: Brassica.xls file downloaded from http://documents.plant.wur.nl/cgn/pgr/brasedb/, March 3rd 2010
Comments: Line name concatenated from resource collection code and genetic resource collection “accession” number; associated data availabel from European Brassica Database of Genetic Resource Collections
Entered by: graham.king@bbsrc.ac.uk
Entry date: 2010-03-03

One’s first instinct of course is to look for the BRA_02028 bit of the name among the DEU146 material in Genesys, but that would be too easy. You have to strip out the assorted underscores, and indeed the leading zero, and that gets you to the right accession, which happens to be from Ethiopia. Breathe.

You could also Google the European Brassica Database of Genetic Resource Collections, as per the metadata, which is hosted by CGN, hence the reference to that organization in the portal. If you search for BRA 2028 you get to the same thing as in Genesys, and eventually to the original record at IPK.

So, to recap: a British guy entered into the Brassica portal some data hosted (as part of a European project) by the Dutch genebank, pertaining to an accession in the German genebank collected in Ethiopia and originally conserved in the old West German national genebank. The actual URL quoted in the metadata returns a 404 error.

Look, I’ve said it before, and no doubt I’ll say it again. It’s great that gene-jockeys like the ones at TGAC build their own databases with all kinds of fancy genotypic and phenotypic data for breeders and other researchers to use. It’s really great, I mean it. It’s what’s going to get the stuff in genebanks used, and we all want that. But please, please, make sure that those breeders and researchers don’t have to go through what I’ve just described to actually get their hands on the seeds. Because I’m pretty sure they won’t. Go through it, I mean. They have better things to do.

Leave a Reply

Your email address will not be published. Required fields are marked *