Genomics has a data problem, according to Nature. Not perhaps as big as YouTube’s, but…
Nevertheless, Desai says, genomics will have to address the fundamental question of how much data it should generate. “The world has a limited capacity for data collection and analysis, and it should be used well. Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can’t easily address questions like this,” he says. Other resource-intensive disciplines, such as high-energy physics, are more centralized; they “require coordination and consensus for instrument design, data collection and sampling strategies”, he adds. But genomics data sets are more balkanized, despite the recent interest of cloud-computing companies in centrally storing large amounts of genomics data.
…
Astronomers and high-energy physicists process much of their raw data soon after collection and then discard them, which simplifies later steps such as distribution and analysis. But genomics does not yet have standards for converting raw sequence data into processed data.
Leave aside for a minute that last sentence, which is generating some heat on Twitter…
Calling bullshit on "But genomics does not yet have standards for converting raw sequence data into processed data." http://t.co/g8zAK8bjj4
— Mick W@tson ↙️ (@BioMickWatson) July 8, 2015
…it is certainly worthwhile highlighting the balkanization of genomics datasets. But then, why not mention that in at least one area — crop diversity — there are some useful initiatives underway, like DivSeek. Which Nature knows about.