DNA-informed breeding, the integration of DNA-based genetic information into plant breeding programs, can enhance efficiency, accuracy, creativity, and pace of new cultivar development. Most genetic knowledge of key traits for plant breeding has been obtained through QTL analyses. Despite an explosion in QTL discoveries for horticultural crops, very few of those discoveries have been translated into tools for horticultural crop breeding. An example of such tools with direct application in crop genetic improvement are trait-predictive DNA tests. The translation of a promising QTL to a trait-predictive “DNA test” has five steps: (1) choose target QTL; (2) design assay to target locus; (3) assay individuals; (4) trace inheritance; and (5) disseminate DNA test details. Key information to convey to end users about a DNA test are the crop and trait(s) addressed, targeted trait locus or loci, and marker type used; trait heritability and genotypic variance explained by the DNA test; allele effects, frequencies, and germplasm distributions; and technical details for running the test. This paper provides instructions for translating promising QTLs into breeder-friendly, trait-predictive DNA tests, based on our experience with tree fruit. Our intent is to accelerate the development of trait-predictive DNA tests and establish a standard framework for reporting them. As scientific understanding of genetic factors controlling breeding-relevant traits continues to expand, systematic and increased DNA test development should help bridge the chasm between academic research and breeding application.
Keywords: DNA-informed breeding; Effective alleles; Markers; Predictiveness; Trait loci; Trait performance predictions; Translational genetics
Horticultural crop production supports many rural communities and contributes to consumer health and well-being . Crops with high productivity, disease resistance, and extended availability and excellent eating quality of their products are sought after by consumers and industry stakeholders in the U.S. and worldwide . Plant breeding is an effective solution for meeting these demands. The efficiency of crop genetic improvement can be increased by integrating DNA information into horticultural crop breeding programs. DNA-informed breeding enables breeders to more effectively identify and exploit the genetic potential present in their crops compared to decisions made based on phenotypic data alone [3,4].
To date, much work has focused on identifying genetic loci underlying trait variation to characterize genetic potential. Since the landmark paper that laid the foundation of quantitative trait locus (QTL) analysis in the 1980s [5,6], thousands of QTLs and Mendelian trait loci (MTLs) have been discovered and described for horticultural crops using linkage analysis approaches [7-9]. Genome-wide analysis (GWAS) employs a different statistical framework than QTL analysis, but the goal of GWAS for plant breeding programs is similar: to understand the genetic architecture and identify causal loci of traits of interest . Information about these trait loci have been archived in searchable databases such as the Genome Database for Rosaceae , the Citrus Genome Database , and the Sol Genomics Network . While QTL analyses have been helpful for understanding the genetic architecture of traits, the information gained is purely academic to breeding programs until it is converted to practical tools that are used to describe the genetics of breeding germplasm.
Published reports on practical application of DNA markers for crop improvement lag substantially behind published QTL findings [7,14-16] as few QTLs have been translated into assays of genetic potential for breeding program use . This disconnect between research and application has been termed “the chasm” . Large multi-institutional research projects in the U.S. and Europe such as RosBREED and FruitBreedomics have worked to bridge this chasm in horticultural crops [19-21]. Some trait-predictive DNA-based diagnostic tools, arising from previously discovered QTLs, have since been developed to assist in breeding of these crops. Such DNA marker assays have targeted: MTLs such as for skin color in cherry , remontancy in strawberry , and disease resistance in tomato ; QTLs with large-effect alleles such as for fruit blush and slow ripening in peach , bacterial wilt resistance in carnation , and powdery mildew resistance in pea ; and QTLs best described by a polygenic model of inheritance such as for bud break in apple and fruit weight in mandarin [28,29]. Genetic assays like these are used to choose valuable parents, target inferior seedlings for removal, and advance selections to the next breeding phase [3,8]. Ru et al.  reviewed reports of marker-assisted seedling selection for crops of the Rosaceae family and concluded that this technology is underutilized by most breeding programs.
A systematic, step-wise approach is needed to help translate research outputs into practical breeding . Here we describe the steps to translate QTL discoveries into breeder-friendly trait-diagnostic “DNA tests”, based on our experience with tree fruit. We also describe the components recommended to report when publishing a DNA test to help ensure that breeding programs use the tool appropriately and successfully. Our aim is to establish a standard format for reporting DNA tests to support the adoption and routine application of DNA-informed breeding for horticultural crops. DNA tests are distinguished here from other types of genetic assays that are not trait-predictive and locus-specific (Table 1).
|DNA test||A locus-specific, trait-predictive DNA-based diagnostic assay of breeding relevance, targeting one or a few trait loci||Yes||Yes|
|DNA fingerprinting panel/set/assay||Several trait-neutral DNA markers used for purposes of identity/relatedness||No||Can be, if not genome-wide|
|DNA profiling assay||Many DNA markers with a known distribution across the genome used for purposes of identity/relatedness and/or trait predictions||Can be||No|
Table 1: Terminology – DNA tests and their counterparts. DNA fingerprinting  assays are for identity/relatedness “characterization” applications rather than traitpredictive “evaluation”; DNA profiling assays involve numerous DNA markers that are genome-wide rather than targeting just one or a few specific loci, for characterization or evaluation purposes .
A DNA test consists of four major pieces of information to be assembled for breeding utility. These four parts (below) inform users of what the DNA test targets, how well it does so, and how to run it.
Breeders need to know the context in which an available test is relevant: the crop and trait addressed, the locus target(s), and the marker type used. A lasting name for each test helpfully includes many of these features for clear communication among breeders, allied scientists, and service providers. A single DNA test can address multiple traits, can contain multiple markers, and a single trait can be served by multiple DNA tests. For example, both apple skin color (degree of blush coverage) and Type 1 red flesh are addressed by the DNA test Md-Rf-SSR where Md = Malus × domestica, the Rf locus is a QTL for skin color and an SSR targets a microsatellite motif within the QTL [3,31]. The apple acidity test Md-Ma×A-Acidity is served by three DNA markers (Md-Ma-indel, Md-LG8a-SSRa and Md-LG8a-SSRb), and multiple DNA tests exist for the ACS ethylene biosynthesis gene in apple, which targets storability (Md-ACS1SNPa, Md-ACS1SNPb, and Md-ACS-indel) . Furthermore, the same traits might be targeted by similarly-named DNA tests in different crops. For example, Md-Rf- SSR, Ppe-Rf-SSR, and Pav-Rf-SSR are used to predict blush coverage of apple, peach, and sweet cherry, respectively [3,22,25].
Further details on the targeted trait locus/loci helps define how well the DNA test can be expected to predict trait performance, which informs deployment strategies. Critical parameters are broad-sense heritability of the trait, the one or more trait loci targeted by the test, the predictiveness of the test, and degree of additivity vs. dominance/ recessivity. Ru et al.  described how a DNA test’s predictiveness (i.e., the proportion of a trait’s genetic variation explained by the DNA test) can be used to determine its deployment strategy that optimizes genetic gain for single traits. DNA tests for which predictiveness is greater than broad-sense heritability of the associated trait are particularly effective for positive selection, in which individuals with the best allelic combination are targeted (parent selection) or retained (seedling selection), while the most beneficial use of DNA tests with a predictiveness lower than the heritability is for culling only the worst allelic combinations . For example, the Md-ACS-indel test explains approximately 10% of the phenotypic variation for fruit firmness after storage across a range of germplasm , corresponding to a predictiveness of 20% as the heritability for fruit firmness in apple has been estimated at 44% . Because the percent predictiveness of the test is lower than the percent heritability of the trait, only culling individuals that carry two negative alleles (worst allelic combination) is advised.
Describing the particular alleles expected to be revealed by a DNA test spans the final gap between possible and actual. Pertinent information on these “effective alleles” includes their predicted effects on the final trait level alone and in observed combinations, their expected frequency in evaluated germplasm, and genotypes (allelic combinations) of standard or example germplasm individuals. For example, the peach DNA test for fruit skin blush, Ppe-Rf-SSR, is reported to detect five effective alleles: amplicon lengths of 395, 397, 399, 401, and 403 bp each associated with either high, medium, or low blush coverage in peach fruit . Including the genotypes for established cultivars is helpful for placing DNA test results in context, avoiding duplication of work, and providing examples of experimental controls for labs. Providing information on germplasm used to calculate the allele effects indicates to users on which material the DNA test can be applied and for which material further confirmation is needed. For example, Pav-Rf-SSR was confirmed in germplasm representing U.S. sweet cherry breeding material and could differentiate accurately fruit color in more than 95% of the germplasm evaluated ; confirmation of the DNA test’s predictiveness would be needed in European or Chinese breeding germplasm.
Genotyping laboratories need to know enough information to run a DNA test. Key details are the genetic marker type(s), primer or probe sequences, PCR conditions, suitable genotyping platforms, and explanations on how to score results. In published DNA tests [24,27,36-39], this component is one of the most consistently reported aspects. Including additional details, such as the amenability to multiplexing PCR reactions, can also be helpful.
Developing DNA tests
Step 1: Choose target QTL
The first step is to decide which QTL(s) to target, according to breeding relevance of the associated phenotypic contrast (Figure 1). Chosen traits for DNA test development must be priorities of breeding programs. For example, disease resistance and fruit quality traits, such as apple scab, blue mold, and fire blight resistance and fruit acidity and texture are priorities of U.S. apple breeding programs [40-43]. Further considerations are the broad-sense heritability of the associated trait, the proportion of genotypic variance of the trait explained by the QTL, and the ease of phenotyping the trait. Ideally, QTLs considered for DNA test development explain a reasonably high proportion of the observed genotypic and phenotypic variance [17,33]. QTLs can still be valuable when heritability is low and one or more QTLs explain most of that heritability . As heritability increases, phenotypic data can predict genetic potential more accurately than genotypic data, assuming high correlation between the trait and marker [33,42]. DNA tests can be used as an alternative when the trait is difficult to measure or is only expressed after a long period, e.g., fruit quality traits in trees with a long juvenility period. Another consideration is the QTL’s reliability, determined by accuracy of the phenotypic data used to detect and characterize it and the QTL’s stability across years, locations, and germplasm. Finally, the germplasm in which the QTL was discovered should be relevant for breeding programs.
Figure 1: Steps to translate a QTL into a trait-predictive DNA test. Development starts by choosing which QTL to target. Candidate assays are created using available sequence information and tested in a small group of individuals. If the developed assay can distinguish the QTL alleles, the assay is tested on a larger set of individuals that represent the target germplasm, and information on allelic variation is obtained. In the final step, DNA test details are disseminated to the user community as a complete breeding tool.
Step 2: Design assay to target locus
The second step is to develop a DNA marker or set of markers that can capture the QTL’s high-value differences in genetic potential. A marker type that suits the genotyping platform of available service providers is chosen. Most DNA tests for rosaceous crops are based on simple PCR-based markers such as simple sequence repeats (SSRs) and sequence-characterized amplified regions (SCARs), although SNP-based tests are becoming popular . The main criterion for breeders to choose which marker type to use is the cost: simple PCR tests (SSRs and SCARs) tend to be cheaper, robust to DNA extracts obtained cheaply and rapidly, and versatile to running DNA tests sequentially – thereby enabling a breeder to avoid paying for many DNA tests run simultaneously . PCR-based markers also allow the detection of more than two alleles whereas SNP-based tests are bi-allelic. Where many alleles exist, each with their specific effect, a single PCR-based test can be developed to distinguish them whereas multiple SNPs are needed to correctly identify the alleles present. For trait loci with a limited number of effect classes, for example disease resistant vs. susceptible phenotypes, one or a few SNPs should be adequate. With the cost of SNP-based assays decreasing, running multiple SNPs can become as cheap as single PCR-based marker DNA tests. Finally, breeders must consider the genotyping platforms offered by their DNA-based diagnostics service provider .
Where SSRs or SCARs are the marker type of choice, DNA sequence data around the locus needs to be obtained. For rosaceous crops, such sequences can be downloaded from the Genome Database for Rosaceae . A 100-kb region flanking the QTL is often sufficient to find polymorphisms associated with target phenotypic contrasts. For highly heterozygous crops such as apple, insertion-deletion (indel) sequence variation can be found by comparing alleles of the reference genome or resequence data of other germplasm individuals. SSRs are an alternative, especially where more than two effective alleles are expected. Ideally, microsatellite motifs of two or more nucleotides repeated 10 to 35 times are targeted because they are likely to contain polymorphism among germplasm and result in readily-distinguishable alleles. Once several indels or microsatellites have been found, primers are designed for multiple such targets to increase the chance that at least one provides the necessary functionality. For example, Sandefur et al.  designed 11 primer pairs during the development of Ppe- Rf-SSR. When designing primers, we recommend BLASTing the primer sequences to ensure genomic specificity of amplification [22,25], including a CG clamp of at least 2 bp to improve annealing, and positioning the primers so that amplicon sizes are amenable to multiplexing with existing DNA tests.
Step 3: Try markers on germplasm
A set of individuals representing the range of QTL alleles of interest should be checked with each candidate DNA test to determine which of its alleles are associated with which QTL alleles. Candidate DNA tests confirmed to readily detect and distinguish target QTL alleles are then run on a larger set of individuals to identify all alleles present, their frequencies, and their distributions in breeding germplasm. For DNA tests obtained from the literature, those alleles present in material relevant to the breeding program should be confirmed. This confirmation on target breeding germplasm ideally uses unselected offspring representing important parents to avoid selection bias [31-33]. The advantage of this strategy, using multiple, pedigree-connected families, is that allele effects can be determined in various genetic backgrounds .
Step 4: Trace inheritance
The penultimate step is to estimate the genotypic and phenotypic variance explained by the test and obtain trait predictions for alleles and allelic combinations. For categorical traits controlled by a single locus with one allele having complete dominance, mathematical modeling might not be necessary. Examples include cherry skin color and Mendel’s round vs. wrinkled peas [22,44]. However, most models of genetic inheritance are more complicated, involving many loci and traits that vary quantitatively.
For quantitative traits, fixed-effect linear models like regression and ANOVA can be used for estimating allelic effects [45-48]. However, fixed-effect linear models do not include genetic background, i.e., additional genotypic effects not accounted for by the assayed loci. As a result, caution should be exercised when extrapolating trait predictions from a DNA test to populations with different allelic composition. Mixed models are an alternative that can account for missing data, genetic background, and related populations. In the mixed model, a relationship matrix is constructed to account for relatedness among individuals in the population [49,50]. A variance component capturing non-target genotypic variance is included in the model. The DNA test can be estimated as a fixed effect or random effect . If it is included as a random effect, the variance of that component is estimated, which is useful for understanding the proportion of phenotypic variance explained by a single DNA test . The random-effects model also allows for the inclusion of other unobserved alleles, a common occurrence when using haplotypes for defining alleles. Incorporating background effects is a key component to understanding the marginal contribution of a DNA test to the trait performance of an individual.
Step 5: Disseminate DNA test details
The final step is to share DNA tests with the user community. The four components described above are collated and made accessible. The RosBREED project has assembled DNA test components for more than a dozen DNA tests in the form of “DNA test cards” . DNA test cards provide breeders with DNA test details in a consistent, double-sided, handout format that can be readily updated . DNA tests can also be reported as peer-reviewed journal publications: Sandefur et al. [22,25] are two examples, each describing a DNA test including all four information components to support effective test deployment. A list of reported apple DNA tests that included enough information to be counted as DNA tests was collated in Evans and Peace . The equivalent for peach and sweet cherry can be found in Tables 2 and 3, respectively.
|Trait||Locus/loci||Marker type (s)||MTL or QTL||Reference|
|root-knot nematode resistance||Mi||CAPS||MTL|||
|skin blush||Rf||SNP, SSR||MTL|||
|red skin color suppression||H||SSR||MTL|||
Table 2: Locus-specific, trait performance-predictive DNA tests available for peach.
|Trait||Locus/loci||Marker type (s)||MTL or QTL||Reference|
|fruit color||Rf, PavMYB10||SNP||MTL||[22,62]|
|fruit size||Various, PavCNR12||SSR, SNP||QTL||[63,64]|
Table 3: Locus-specific, trait performance-predictive DNA tests available for sweet cherry.
Further steps can be taken, during or after DNA test development; to maximize the positive impact each new DNA test has on breeding programs. Costs of deploying DNA tests, whether using in-house or commercial diagnostics services, can be compared to costs of phenotype-based selection methods. Some crop research communities have online cost-effectiveness tools that provide quick comparisons (e.g., ). Decision-support tools that model the genetic gain achievable from a DNA test’s deployment can also be used to compare alternative deployment strategies
For routine translation of discovered QTLs into practical and accessible DNA tests for plant breeding, we recommend a collaborative approach to assemble and leverage knowledge bases effectively. The areas of expertise most essential to a translational genetics team are:
• Fluency with the conceptual and operational components of breeding for a specific breeding program as well as the crop of interest. As a result, planned deliverables will be based on actual rather than perceived demand. Breeders themselves should be part of the team.
• Familiarity with the current and historical germplasm of the crop, including a working knowledge of close and distant pedigree connections among all individuals.
• Genetics skills in tracing inheritance of alleles and in understanding the key features of discovered QTLs such as the meaning and repercussions of the genotypic variance explained by a DNA test.
• Laboratory skills to conduct the DNA test development steps described earlier. Knowledge of current genotyping platforms and awareness of upcoming technological developments is also required.
QTL discovery does not automatically lead to practical breeding tools. QTLs need to be converted into DNA tests and important components described, including the crop and trait(s) addressed, targeted trait locus or loci, and marker type used; trait heritability and genotypic variance explained by the DNA test; allele effects, frequencies, and germplasm distributions; and technical details for running the test. As scientific understanding of the genetic factors controlling breeding-relevant traits continues to expand, systematic and increased DNA test development, as described here, should help bridge the chasm between academic research and breeding application.
We thank WSU PhD graduate Paul Sandefur for advances made in his project on DNA test development across several tree fruit crops. This work was supported by the Washington Tree Fruit Research Commission, USDA’s National Institute of Food and Agriculture (NIFA)–Specialty Crop Research Initiative projects “RosBREED: Enabling Marker-Assisted Breeding in Rosaceae” (2009-51181- 05808) and “RosBREED: Combining Disease Resistance and Horticultural Quality in New Rosaceous Cultivars” (2014-51181-22378), and USDA NIFA Hatch projects 0211277 and 1014919.