It’s germplasm evaluation, Jim, but not as we know it

Next generation sequencing (NGS) holds the promise for a more efficient approach to germplasm evaluation whereby a carefully selected subset of accessions can be sequenced and phenotyped in detail; associations discovered between genotypes and phenotypes in this subset could be used to predict the phenotype of other accessions based on sequence data alone.

Ah, “the promise.” Always the promise. But actually, in this document, “Technical appraisal of strategic approaches to large-scale germplasm evaluation,” (pdf) some of the practicalities are spelled out, and in quite a lot of detail. You be the judge of whether the vision outlined in that opening quote is of a far-away, Star Trek world, or something that’s really just around the next corner. You could comment on the document itself, but some people have encountered problems with that, so do it here instead. We will collate and feed back all comments.

LATER: And here’s a PDF of the presentation made by the author at this year’s PAG.

{ 4 comments… read them below or add one }

Brian Scheffler December 17, 2012 at 12:24 pm

Brian Scheffler
USDA ARS Genomics and Bioinformatics Research Unit
Stoneville MS

Overall the document is fairly thorough. Naturally the monetary aspects, sequence capacity and sequence read length are significantly out of date but that is be expected due to the rate of change in the market and when the document was first developed. This is true of other platforms mentioned like the BeadExpress, which will be an outdated system by the implementation of this project. These aspects need to be updated before discussions take place.

While cassava represents a decent model system, based on the parameters given and available resources, an evaluation probably needs to be made to compare analysis of with and without a reference genome for the same species.

Due to the decreasing costs of a MiSeq run and increasing read length I think inclusion of 60X coverage (final coverage depth could be debated or examined) of whole genome sequence (WGS) and subsequent rough draft assembly might be warranted within the plan– especially for key species. The MiSeq will generate, in 2013, 2 X 300 bp reads at about 15 Gbp in one run and my guess for about $1300 per run. The long reads and depth of coverage would provide a better template to map RRL sequences and identify gene space.

The issues revolving around polyploids was not adequately represented, in my opinion. The polyploidy issue is very important and represents a major hurdle with RRL.

Some analysis programs, like TASSEL, were not designed to take advantage of the longer read technology. In fact TASSEL is set at 64 bp at the moment. Long reads are presently essential for polyploids.

While cloud computing resources, like iPLANT, are highly desirable a “Data upload site” would be beneficial to the scientific community. Moving large datasets by the Internet is not an option for many institutes and thus data are often shipped to other locations via hard drives and courier services. If there is to be a central cloud then a location where data can be physically sent and then uploaded to the cloud would be highly desirable. Another desirable aspect of an upload center is it would encourage greater scientific cooperation on data sharing.

It should be noted that for at least GBS more enzyme and barcodes are needed. In addition, there has been significant work out of the Jesse Poland (USDA/ARS, Manhattan, Kansas) group on the issue of polyploids (especially wheat) and use of multiple restriction enzymes. In addition, I have been told that it is best to run a whole lane of Illumina sequencing with one GBS experiment and species at a time. This means that for economical purposes it is undesirable to analyze a small number of samples and thus additions to a germplasm collection would require 96-384 new samples before running an additional RRL experiment.

Automatic and open pipelines for converting RRL sequences into SNP assays for different detection platforms (Golden Gate, Fluidigm etc) are needed. This is so biological relevant SNPs can be easily converted into useable MAS markers quickly and independent of vendors.

User friendly pipelines are desperately needed so curators and breeders can do analysis. This will absolutely be required so techniques like Genomic Selection can be utilized for applied breeding and curation. Data generation without utilization is the worst possible scenario as it will negatively impact future research and investment dollars. Presently iPLANT is trying to provide this type of backbone but it is not clear that iPLANT can keep up with the present rapid change of software as many platforms are not fully mature before public release. Workshops on Genomic Selection and RRL for breeders and curators is desperately needed. These are new approaches and the majority of the world’s breeders (even in highly developed countries) are not familiar with the approaches much less have the knowledge base to take advantage of them.

Nucleic acid isolation of sufficient quality for NGS for some plant species will not be trivial.

Best practices need to be developed now and shared so curators/breeders can start to develop material now for eventual genomic characterization. Best practices could include procedures to select the best material to be characterized also purification of genetic stocks starting now (if required) as it may take numerous generations. Collection of any information on DNA protocols used to isolate DNA as well as any information on LD or general diversity would be desirable. The level of LD will impact greatly the methods used or at least depth of sequence needed for characterization. Also for polyploid crops determine if a diploid can be found and used as a reference genome.

The timeline for the proposed project means that technology and knowledge will greatly change before a second phase or full implementation, thus meaning the pilot project, while not invalid, will not represent the true state of the art. This does not mean the project should not go forward but rather it should it needs to have funds and flexibility to adapt to scientific progress.

The Trust has a crop priority list (http://www.croptrust.org/sites/default/files/documents/files/Annex1crops_0.pdf). This needs to evaluated in the context of which crops are being handle by the world community and which crops are most suitable for analysis with today’s technology and known limits. Then a list of limitations for the other crops needs to be made. It is only by this method along with other relevant information can the Trust pick the best crop for a pilot project and a priority list for future phases.

Reply

Major Goodman December 17, 2012 at 12:26 pm

There are several specific comments, but the glaring failure here is not running almost any of these ideas past private plant breeders who have tried to make use of these technologies, mostly to no avail. Nor do I see input from folks who have tried to maintain and study germplasm accessions, who could at least comment on some of the fieldwork feasibility. This seems to be an in-house, pat yourself on the back effort by and for NGS-enthralled scientists. In fact, I see almost no input by any real plant breeder or germplasm expert. The one sub-study that I am most familiar with is the disastrous top-cross study by CIMMYT using single plants from widely variable accessions. Interesting, but no practically usable output since the males used are now extinct (they were not selfed!). Using any type of genome wide or association mapping with accessions – at least in maize – is confounded by founder effects that are extremely difficult to trace or characterize – and we don’t have the pedigrees, either published or inferred from data – that Buckler’s inbred lines have. The assumption that inbred strains can easily be derived by SSD from many maize accessions – or even some races – is false. Try Salpor or Imbricado or Jala or Cuzco Gigante or any of a dozen others – good luck! And once acheived, such inbreds are to be discarded only to be re-done when the next technology comes along. Please! Yes, selfing is cheap, but it is not cheap with perhaps 25% to 50% of the maize races. I am also not at all certain that all the calls for standardization of procedures and data are well thought out. Many things are center-specific or crop-specific, and frankly standardizing things that are still very much experimental may be exerting effort on something that isn’t worthwhile or isn’t sufficiently mature to merit standardization. What works nicely for dairy cows or even for elite inbred crosses in the US may be totally inappropriate for a germplasm bank for maize in Mexico. 1st sentence p.20 virtually ignores GxE.

Major Goodman

Reply

Elhan Ersoz December 19, 2012 at 3:55 am

With all due respect- I do not agree with Major on lack of interest from private sector for use of these technologies. I am a former Buckler Lab post-doc and currently working for a major seed company. Needless to say I am a fan of NGS based technologies, and am actively developing methodologies that can utilize such technology in the private breeding programs. Although it took a while to convince the traditional breeders on the value of such high density genotyping approaches, I think we are making progress on that front. Statistical models developed from results generated through GWAS and GS type studies actually allow what is called Predictive breeding where we can take marker-assisted-breeding to the next level by predicting phenotypes for crosses that are not feasible or practical to make due to heterotic grouping issues in silico – simply by applying a QTL model in combination with a background model, we can actually generate an expectation for performance of hypothetical crosses, using this information. I should say, in my experience- the approach generates reasonably accurate predictions compared to the actual field observations from real crosses, based on historical data. The more we characterize the background diversity and the QTL architecture for a trait, the better the accuracy of the predictions.
The main criticism of Major’s is absolutely valid, that GWAS type studies is confounded by founder effects and population structure. But Nested Association type strategies, actually circumvent these issues.
There is another big problem which is especially profound for hybrid maize breeding applications where, genetic divergence in breeding material is expected and is in fact encouraged in order to improve chances for heterosis. As Major indicates, development of inbred lines from what is considered exotic material, or even introgression of multiple favorable alleles from exotic backgrounds to elite backgrounds – is a process that would require substantial investment in time, and resources- and a whole lot of patience, and even then it is likely to fail. Major himself have released material that was generated through such an wide cross-NC maize inbreds, that are basically tropically adapted temperate lines. As students of the field, we are all understand and appreciate the effort it would take to create such varieties.
However, with resources that are available from a private seed company, we can actually adress and eliviate some of those issues that make wide crosses un-feasible..
For instance creation of a large isogenic introgression line library on multiple commercially feasible male and female backgrounds using a set of donors that would capture a reasonably high fraction of haplotype diversity.
The isolines lines then can be used for stacking favorable alleles from multiple exotic donors for their respective QTL locations in a clean iso-elite-female or iso-elite-male background, that are in fact used as donors to carry the exotic alleles into the elite germplasm pool.
The company I am working for actually created such iso-line libraries with multiple exotic introgressions and is making them publicly available for public breeding programs as well
See here https://pag.confex.com/pag/xx/webprogram/Paper4485.html

So basically, thanks to the NGS approaches and GWAS and GS – of course-in combination with proper experimental designs and statistical analysis methodologies- we can actually realize our long time ambitions for predictive breeding and breeding-by-design.

I appreciate the opportunity to comment.
Thanks,
-Elhan

Reply

Elhan Ersoz December 19, 2012 at 4:00 am

I realized I made a mistake in part of my comments
Above statement on “tropically adapted temperate lines.” comment regarding NC lines that Major Goodman developed should read “temperate adapted tropical lines” or in other words temperate lines with tropical introgressions.
I apologize.

Reply

Leave a Comment

{ 1 trackback }