An item in Monday’s Brainfood prompted Brian Ford-Lloyd to revisit the concept of core collections. The paper looked at “similarity groups” in genebank holdings.
One important question they addressed was ‘why identify similarity groups?’ (not to be confused with duplicates), and only time will tell whether their expectations will be met.
However, there are other issues that occurred to me. One is the relationship with ‘core collections’ (which are not mentioned in the article) of which there are now many, even for a single crop such as rice, and which are proven to be of considerable use (see: Genetic resources and conservation challenges under the threat of climate change, Ford-Lloyd, Engels and Jackson – in Plant genetic resources and climate change – Jackson, Ford-Lloyd and Parry, 2014) (sorry for the plug!). So, having identified similarity groups, is it now necessary to go back and redesign core collections? This seems unlikely, but it would perhaps be worthwhile checking core collections to see the extent of occurrence of ‘similar’ accessions. This might have particular value, not necessarily to ensure maximised diversity within core collections. It might be useful to look for similar accessions to those that have already proved to be of value within core collections, possibly revealing similarly adapted accessions of even greater value.
Core collections (which are generally fixed) or core selections (which can be selected based on the requirements of the user in terms of size and material to be represented) aim at sampling a set of germplasm as diverse as possible. We create these cores to maximise the chance that the user finds what (s)he is looking for in cases we do not have better information to use in the selection.
From the concept it follows that similar material will be sampled at a lower density than diverse material, in other word, similarity groups as they are called in the article will in any core selection strategy be sampled at a low frequency. If passport data are used these groups will turn up based on the similarity of the origin, if marker data are used based on the genetic similarity.
In regards the selection of core collections a more urgent question is how the strategies will deal with genomic information; how will we use the knowledge of the actual sequences to optimise the set selected for use? As the diversity in different parts of the genome appears to be quite different, and as we start to be able to recognise the functional parts, we’ll need new optimising strategies.