Making that haystack smaller

Germplasm collections can be very large, and that can put off potential users. What breeder really wants to screen thousands of accessions, when only a dozen might end up being useful? It’s not surprising, therefore, that people have looked for short-cuts. One approach is to make a “core collection.” You use the available data on the collection to select a sub-set which you hope will contain most of the original genetic diversity in a fraction (20%, say) of the total number of accessions. And then you evaluate that subset, rather than the whole collection, and use the results to delve back into the remaining 80% of the material, with hopefully a better chance of finding what you’re looking for.

That’s been done for lots of large collections now, with a certain amount of success in increasing their use — and usefulness. But breeders are not really satisfied. They want to shorten the odds even more. And the application of Geographic Information Systems (GIS) technology in something called the Focused Identification of Germplasm Strategy (FIGS) provides a potentially effective way of doing just that.

Jeremy described recently over at Bioversity how FIGS was used to increase the chances of finding a needle in a haystack by “start[ing] with a smaller haystack.” The haystack was 16,000 wheat accessions. The needle was resistance to powdery mildew.

It works like this: take 400 genebank samples known to have some resistance to powdery mildew and use the geographical location where they evolved and were collected to determine the environmental profile that can be associated with resistance. Then apply that profile to a further 16,089 samples with location data, using the profile as a template to identify those that were found in places that share the conditions associated with resistance. The result is a group of 1320 wheat varieties, mostly from Turkey, Iran and Afghanistan. This much more manageable subset was screened by growing them with diverse strains of powdery mildew. About 16% of the samples (211 of 1320) showed some resistance.

These varieties then moved to the next phase, molecular screening for the presence of different alleles of the Pm3 gene. More than half (111 of the 211) had Pm3 resistance, some in previously unknown forms. In the end the group isolated and identified 7 new functional alleles of the Pm3 gene. It took scientists 100 years to find the first 7 Pm3 alleles. FIGS doubled the number in a fraction of the time.

Very good. But is it always going to work? Another recent paper — in fact, a series of papers — counsels caution.

Researchers at USDA-ARS in Madison, Wisconsin and at the International Potato Centre (CIP), who collectively sit on the largest collection of wild potatoes in the world, have been looking for some time at how geography can help them better use their very diverse 2,500 or so accessions of about 190 species.

In contrast to the wheat powdery mildew example, previous work with these wild potatoes has found only weak associations between climatic variables and things like resistance to frost and to a couple of different fungal diseases. The latest paper looks at Colorado potato beetle resistance. ((Jansky, S. H., Simon, R. & Spooner, D. M. (2009) A Test of Taxonomic Predictivity: Resistance to the Colorado Potato Beetle in Wild Relatives of Cultivated Potato. Journal of Economic Entomology, Volume 102(1):422-431.)) It again finds little predictive power in environmental variables:

Resistance is not concentrated enough to provide guidance regarding geographic localities likely to contain a high proportion of populations containing Colorado potato beetle resistance.

Which species an accession belonged to was a much better clue to finding resistance to the pest than any combination of the 38 temperature, rainfall, altitude and latitude variables used in the analysis.

Now, one can argue about the ecological validity of these climatic variables. Or about whether some other factor — soils, say — would have fared better. Or about the completeness and representativeness of the geographic coverage of the collection. Or about whether the results would have been different if the technique had been applied to each species individually, rather than to the genepool as a whole. But this series of papers does suggest that, important as it undoubtedly often is, the use of location data may not be the universal panacea that some of us were hoping for.

Looks like we’ll need a diversity of strategies to find those needles.

9 Replies to “Making that haystack smaller”

  1. Yes the case of potatoes is very interesting indeed, but is it typical? There are numerous papers showing that using ecogeography as a predictor of patterns of genetic diversity does not always work, but ecogeography is still widely used because in the absence of clear genetic diversity or characterisation / evaluation data there is practically no alternative. Predictive characterisation using the FIGS approach will I feel be a similar case, although it may not be perfect it is an excellent pragmatic tool.

  2. All good points. The FIGS approach is not intended to be a universal panacea. It should mainly be considered for finding adaptive trait variation. Unlike the core collection concept, it does not try to concentrate all the available genetic variation in a 5-10% sub-sample of the original collection. The approach has been demonstrated to be effective with powdery mildew, CCN tolerance, boron toxicity tolerance, Sunn pest, RWA and is showing initial promise for salinity tolerance – all with respect to bread wheat. We should just think of FIGS as a strategy for linking breeders, and other users, to the ‘candidate’ accessions (good chance of having the genetic variation they are looking for) in ex situ collections. At the same time we should expect that there will be other strategies (or methods) around, or under development, that will build further on Vavilov’s (1957) concept of “starting with the right material” to ensure success in plant improvement.

    A general observation after about 30 years in the game: Common sense seems to be more effective in exploiting plant genetic resources than rocket science. Perhaps this might stimulate some interesting discussion?

    Vavilov, N.I., 1957. World resources of cereals, grain leguminous crops and flax and their utilization in plant breeding. Agroecological survey of the principal field crops. Izdatel’stvo Akademii Nauk SSR, Moskva, Leningrad, 463 p.

  3. Populations of wild potatoes and their pests are biological entities and as such will be subject to evolutionary processes driven by selection pressures placed on them by their environments. FIGS seeks to use environmental parameters to predict where certain selection pressures occur that would favor sought after traits.

    In developing FIGS we used geographic information to define environments where we have found resistance before and then looked in similar environments to find new sources. Or we defined environments that would favor high population densities of the pest – it proved more successful than using a random set selection process or a core set.

    The fact that the studies by Jansky et al did not find a clear correlation between the distribution of resistance for some pests could be a function of the algorithms they used and the environmental data they used. Other niche modeling exercises have found that the BIOS parameters within the Worldclim suite of surfaces seem to be of more significance than .

    Further, what level of statistical significance is used to say something is correlated to a particular trait or not is open to debate. The 5% or even the 10% rule, while rigorous, are arbitrary and may preclude certain possibilities. Thus when applying these models in a germplasm selection context we perhaps need to play with accepting a lower level of significance if we are to maximize our chance of putting together a set of material that contains a higher proportion of the sought after trait.

    Finally – FIGS is an approach that has potential to evolve into a pragmatic tool given further intellectual input.

  4. The same authors in a recent Crop Science paper tested taxonomic and biogeographic associations with 10,738 disease and pest evaluations, derived from the literature and genebank records, of 32 pest and diseases in five classes of organisms (bacteria, fungi, insects, nematodes, and virus). The data showed that ratings for only Colorado potato beetle [Leptinotarsa decemlineata (Say)] and one pathogen (Potato M Carlavirus) are reliably predicted both by host taxonomy and climatic variables.
    http://tech.groups.yahoo.com/group/CropWildRelativesGroup/message/490

  5. To me, the potato results mean that we shouldn’t be looking at selection, but at migration and drift.

    Selection reduces diversity, migration and drift determine where it goes.

  6. The recent examples at ICARDA of FIGS approach to better target sources of valuable traits are first sign of the relevance of the approach. I am sure the approach can be further refined and is pursued at ICARDA with partners. The success of such approach will rely, in addition to available environmental layers, on the availability of information of distribution of stresses and on the virulence spectra in case of biotic stresses. Subsets established for a given biotic stress could differt from one region to the other based on the virulence of the populations.

  7. I can see a debate that may be there but shouldn’t. It should be simple — we give each user what (s)he needs. Users studying diversity need core collections. Users trying to identify genes for a trait need diversity for that trait, uniformity for everything else. Users needing a particular trait need just that – with trait-specific subsets if the phenotyping data are available, or based on predictions such as provided through FIGS. The only problem with core collections is they’ve been pushed for purposes they weren’t intended for.

Leave a Reply

Your email address will not be published. Required fields are marked *