How would you PageRank genebank accessions?

Various friends have sent me, over the past few days, different takes on a recent paper which used the Google PageRank algorithm to identify the most “important” species in food webs, perhaps because they know I’m a sucker for examples of cross-pollination between disciplines. The BBC had its say, and also ScienceDaily, among others. I posted the ScienceDaily article on Facebook, as I am wont to do when I think something is interesting — maybe even have a gut feeling it might be relevant to agrobiodiversity conservation — but don’t know quite what to make of it. Sure enough, someone left a comment that he thought the algorithm was a secret, which was also my understanding: Google don’t want people to manipulate the rank of their web pages. But then someone else came in and said that the basics of how the thing works are in the public domain.

To prove it, he provided a link to an American Mathematical Society article entitled How Google Finds Your Needle in the Web’s Haystack. Which is why I love social networking, but that’s another story. Now, that article is definitely NSFW, unless you work at the American Mathematical Society, so think twice before clicking, but here’s the lede:

Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you’d like to find it in a matter of seconds. How would you go about doing it?

And I thought to myself: just change that 25 billion, which of course refers to the number of pages on the internet, to 6.5 million or 7.2 million or whatever, and the guy could just as easily be talking about accessions in the world’s genebanks.

Now, basically we search for the germplasm we need by starting with a big dataset and applying filters: wheat, awnless wheat, awnless wheat with such and such resistance, awnless wheat with such and such resistance from areas with less than x mm of rainfall per annum, and so on. Would it make any sense to rank the accessions in that initial big dataset? On what basis would one do that anyway? That is, what is the equivalent of hyperlinks for accessions? Because the essence of PageRank is that important pages receive lots of hyperlinks from important pages. So, numbers of requests? Amount of data available on the accession? But wouldn’t that just mean that only the usual suspects would get picked all the time? Genetic uniqueness, perhaps, then? That would be turning the algorithm on its head. Looking for lack of connections rather than connections to other accessions. You could in fact have different ranking criteria for different purposes, I suppose.

Ok, now my brain hurts. This cross-pollination stuff can be fun, but it is hard work.

Nibbles: Chickens, Peppers, Treaty, Breadfruit, Preservation, Food systems, Adaptation, Yam multiplication

Nibbles: Chicory symbolism, Watermelon disease, Olive documentation, Camassia quamash, Pig maps

Nibbles: Svalbard, Consumers, Seed law, Fragrant rice, Five Farms on radio, Invasive plant used, Genetic diversity and latitude, Coffee and tea in history, Coconut disease

Chinese interdependence

ResearchBlogging.orgA paper just out in Agricultural Science in China reminded me that I wanted to say something about one of the great meta-narratives of plant genetic resources: interdependence — the old no-country-is-self-sufficient-in-PGR mantra. Which, unlike some other meta-narratives, is generally recognized as being true — witness the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA). And that despite the fact that measuring interdependence is not by any means easy, and has not often been done.

The paper which caught my eye is not really primarily about interdependence. 1 It just shows that cultivars of winter oilseed rape (canola) from China are very distinct from European ones, on the basis of molecular markers. Which presumably means that yield gains could be had from cross-breeding between the two groups. Which does say something about interdependence, but not very forcefully.

However, that paper reminded me about two others that a colleague had recently sent me, along with the thought that they should be enough, in a perfect world, for China to ratify the ITPGRFA.

The first is about soybean. 2 It shows, using molecular markers again, that a couple of elite Chinese cultivars benefited greatly, in terms of both specific traits but also their difference from previous Chinese cultivars (that is, the genetic base of the crop as a whole was broadened) from the fact that US and Japanese germplasm was involved in their development, rather than just Chinese stuff.

The second paper makes the interdependence point even more strongly by quantifying the contribution of foreign maize germplasm to production in China, rather than just genetic diversity. 3 It turns out that a 1% contribution by US material (based on the coefficient of parentage) translates to an additional 0.01 t/ha (0.2%), and a 1% contribution by CIMMYT germplasm to an additional 0.025 t/ha.

The conclusion: “The extensive utilization of US and CG germplasm improved maize yield potential in China… The government should provide funds to support research on germplasm introduction…” And, we could add, it should ratify the ITPGRFA. No country is self-sufficient in PGRFA. Not even the largest.