APS Online Publications
RESEARCHFree Access icon

Genome-Wide Association Mapping and Genomic Prediction of Fusarium Wilt Race 2 Resistance in the USDA Citrullus amarus Collection

    Affiliations
    Authors and Affiliations
    • Venkata Rao Ganaparthi1
    • Gabriel Rennberger2
    • Patrick Wechter1
    • Amnon Levi2
    • Sandra E. Branham1
    1. 1Coastal Research and Education Center, Clemson University, Charleston, SC
    2. 2U.S. Vegetable Laboratory, USDA-ARS, Charleston, SC 29414

    Published Online:https://doi.org/10.1094/PDIS-02-23-0400-RE

    Abstract

    Fusarium wilt caused by Fusarium oxysporum f. sp. niveum (Fon) race 2 is a serious disease in watermelon and can reduce yields by 80%. Genome-wide association studies (GWAS) are a valuable tool in dissecting the genetic basis of traits. Citrullus amarus accessions (n = 120) from the USDA germplasm collection were genotyped with whole-genome resequencing, resulting in 2,126,759 single nucleotide polymorphic (SNP) markers that were utilized for GWAS. Three models were used for GWAS with the R package GAPIT. Mixed linear model (MLM) analysis did not identify any significant marker associations. FarmCPU identified four quantitative trait nucleotides (QTN) on three different chromosomes (i.e., chromosomes 1, 5, and 9), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK) identified one QTN on chromosome 10 as significantly associated with Fon race 2 resistance. FarmCPU identified four QTN that explained 60% of Fon race 2 resistance, and the single QTN from BLINK explained 27%. Relevant candidate genes were found within the linkage disequilibrium (LD) blocks of these significant SNPs, including genes encoding aquaporins, expansins, 2S albumins, and glutathione S-transferases which have been shown to be involved in imparting resistance to Fusarium spp. Genomic predictions (GP) for Fon race 2 resistance using all 2,126,759 SNPs resulted in a mean prediction accuracy of 0.08 with five-fold cross-validation employing genomic best linear unbiased prediction (gBLUP) or ridge-regression best linear unbiased prediction (rrBLUP). Mean prediction accuracy with gBLUP leave-one-out cross-validation was 0.48. Thus, along with identifying genomic regions associated with Fon race 2 resistance among the accessions, this study observed prediction accuracies that were strongly influenced by population size.

    Fusarium wilt (FW) is one of the most important soilborne diseases of watermelon. The disease is caused by the fungus Fusarium oxysporum f. sp. niveum (Fon), of which four pathogenic races (0, 1, 2, and 3) have been described thus far (Martyn 1987; Martyn and Netzer 1991; Netzer 1976; Netzer and Martyn 1989; Zhou et al. 2010). Race 2 is the most widespread and is responsible for the most economic damage in the United States and in numerous other countries (Martyn 2017). Commercial watermelon cultivars with resistance to races 0 and 1 are available, but no cultivars with resistance to races 2 or 3 have been released (Wechter et al. 2012). The only available management option for watermelon grown in soils infested with Fon race 2 is grafting watermelon scion onto rootstocks of bottle gourd (Lagenaria siceraria), interspecific hybrid squash (Cucurbita maxima × C. moschata; Keinath and Hassell 2014), and more recently onto the resistant citron melon rootstock cultivar ‘Carolina Strongback’ (Citrullus amarus; Keinath et al. 2019; Wechter et al. 2016).

    Fon overwinters in the soil by producing chlamydospores, thick-walled spores specialized in long-term survival, which can remain viable in soil for more than 10 years (Martyn 2012). Due to this pathogen’s persistence in the soil and a lack of chemical control options, discovery and incorporation of genetic resistance to FW represents the most feasible management strategy (Wechter et al. 2012). Like other vascular wilt pathogens, Fon enters the host plant through the root tissue and invades the xylem vessels where it produces microconidia which germinate and promote the spread of the pathogen through the xylem. Once this part of the plant is fully colonized, a mixture of cell wall-degrading enzymes produced by both the host and pathogen leads to the formation of gums and gels inside the xylem vessels. Together with the production of tyloses by the plant, the xylem vessels become blocked. Reduction of the plant’s ability to transport water results in wilting of vines and ultimately the death of the plant (Martyn 2014). Yield reduction of susceptible watermelon cultivars grown in fields infested with Fon race 2 can be as high as 80% (Keinath and Hassell 2014). In Spain, as much as 59% of nongrafted triploid watermelon plants died when transplanted into an infested field leading to yield losses of 42 to 68% (Miguel et al. 2004).

    Repeated and long-term selection for desirable fruit traits in modern watermelon (Citrullus lanatus) cultivars has led to a narrow genetic base with fewer disease resistance loci (Levi et al. 2001), and no resistance has been identified against Fon race 2. In contrast, citron melon (C. amarus) has higher genetic diversity (Guo et al. 2019; Levi et al. 2013), and multiple pathogenicity bioassays of the USDA collection of Citrullus spp. have shown its value as an important reservoir of disease resistance alleles against a wide range of bacterial, fungal, and viral pathogens (Branham et al. 2017, 2019a, 2020; Katuuramu et al. 2022). C. amarus and C. lanatus (edible watermelon) are readily intercrossed, allowing introgression of resistance alleles into cultivated watermelon. Understanding the genetic architecture of Fon race 2 resistance in C. amarus is vital in devising strategies to incorporate this resistance into commercial cultivars. Previous mapping studies of Fon race 2 resistance have been performed in biparental populations (Branham et al. 2017, 2020), but genome-wide association studies are lacking. This study focuses on identifying Fon race 2 resistant genetic resources in the USDA C. amarus Plant Introduction Collection and on elucidating the genetic architecture associated with the resistance trait.

    Materials and Methods

    Plant materials

    One hundred and twenty-three C. amarus accessions were obtained from the USDA Plant Genetic Resource Conservation Unit (PGRCU) in Griffin, GA, and were increased by self-pollination for a minimum of two generations at the United States Vegetable Laboratory in Charleston, SC, to decrease heterogeneity within each accession. Seed availability and germination limited the accessions available for disease screening to 120 accessions for test 1 and 119 for test 2.

    Inoculation and disease evaluation

    Fon race 2 cultures were prepared using a modified procedure optimized by Wechter et al. (2012). Fon race 2 isolate B05-30cvd, obtained through single spore isolation, was grown on Difco potato dextrose agar at 25°C under a 12-h lighting cycle of fluorescent light. After 1 week, five 1-cm disks were cut from the growing margin and added to 500-ml Erlenmeyer flasks with 200 ml of potato dextrose broth and placed on a rotary shaker at 170 rpm for 2 weeks. The spore suspension was filtered through two layers of cheese cloth and a single layer of Miracloth (EMD chemicals, San Diego, CA) to remove hyphae. Spores were quantified with a hemacytometer, and the final concentration was adjusted to 1 × 106 spores per ml with sterile distilled water. Spore solution was added to a 1:1:1 trisoil mix of perlite:vermiculite:metromix 360 potting soil (5 liters of diluted spore suspension to 22 liters of trimix) and were homogenously distributed in an electric concrete mixer. Ten seeds per each accession were seeded into the inoculated trisoil mix in 50-cell propagation trays and were grown in a growth chamber equipped with balanced (four red:one blue) LED lights (G.E. Arize lynk lights, Cleveland, Ohio) at 23 ± 1.5°C. Two separate tests were performed in a randomized complete block design with two replications of five seeds for each genotype planted into the inoculated trimix. Twenty-eight days after inoculation, plants were rated on a 5-point scale (Branham et al. 2019b): dead were rated as 5; completely wilted plants were rated as 4; plants with wilted cotyledons and true leaves but healthy stems were given a rating of 3; plants were rated as 2 if only either cotyledons or true leaves were wilted; and completely healthy plants were rated as 1.

    Whole-genome resequencing and variant discovery

    Variants called from whole genome resequencing of the collection were previously published (Katuuramu et al. 2022). Briefly, paired-end reads were generated by sequencing shotgun genomic libraries on a single lane of a NovaSeq 6000 (Illumina, San Diego, CA). Reads remaining after filtering and removal of low-quality sequence with trimmomatic v.0.38 (Bolger et al. 2014) were aligned to the reference genome using the Burrow-Wheeler Aligner v.0.17 (Li and Durbin 2009). The C. amarus reference genome of USVL246-FR2 v1 was downloaded from the CuGenDBv2 (Yu et al. 2023). The reference genome was indexed with the software SAMtools (Li et al. 2009). Variant discovery and quality control filtering was performed using the GATK v.3.6 best practices workflow (Depristo et al. 2011; McKenna et al. 2010; van der Auwera et al. 2013).

    Analysis of the phenotypic data

    Best linear unbiased estimates (BLUEs), best linear unbiased predictions (BLUPs) and means for each test and across tests were obtained using the lmer function in the R (R Core Team 2022) package lme4 (Bates et al. 2015). Homogeneity of test and replication variances were assessed with the Bartlett test. The model utilized for obtaining BLUEs was

    Y=gi+rj+tk+gi:tk+eij

    where Y represents the BLUEs of each accession, gi is the fixed effect of the ith genotype, rj is the random effect of the jth rep, tk is the random effect of the kth test, gi:tk is the interaction between ith genotype and kth test, and eij is the random error variance. The same model was used to obtain BLUPs, except that genotype was included as a random factor. Broad-sense heritability of Fon race 2 resistance among the accessions in the study was calculated using the following formulae (Piepho and Möhring 2007):

    H2=σg2σp2
    σp2=σg2+σgt2m+σ2rm

    where σg2 is the variance due to genotype, σp2 is the phenotypic variance, σgt2 is the variance of the genotype-by-test interaction, m is the number of trials (experiments), and r is the number of replicates per trial.

    Genome-wide association analysis of Fon race 2 resistance in C. amarus accessions

    Genome-wide association studies (GWAS) were performed with the BLUEs obtained from the phenotypic data. GWAS was performed using the GAPIT function in the GAPIT3 package in R (Wang and Zhang 2021). Three models were used to identify single nucleotide polymorphic (SNP) markers associated with Fon race 2 resistance among the accessions: one single-locus model (mixed linear model [MLM; Lipka et al. 2012]), two multilocus models (fixed and random circulating probability unification [FarmCPU; Liu et al. 2016]), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK; Huang et al. 2019). To select the optimum number of PCs to account for population structure in the GWAS, the argument model selection = TRUE was used, with a maximum of five PCs. The number of PCs with the highest Bayesian information content (BIC) for each trait were utilized as covariates in the GWAS (Wang and Zhang 2021). Model fit was assessed with Q-Q plots. Bonferroni adjusted P values below a significance threshold of 0.05 were determined to be associated with Fon race 2 resistance. Phenotypic variance explained by significant SNPs in each model were obtained by regression on BLUEs using the lme function in the lmer package of R.

    Interspecific polymorphism identification

    To identify interspecific polymorphisms with potential to be developed into KASP markers for future introgression efforts, we identified all polymorphisms between two stable resistant accessions, USVL246-FR2 and USVL252-FR2 (Wechter et al. 2012, 2016), and 14 susceptible diploid cultivars (accession numbers are provided in Table 3) within 100 bp of each significant SNP in the CuGenDBv2 (Yu et al. 2023) database using the genotype tab. SNPs for the stable resistant accessions and susceptible cultivars were called from whole genome resequencing data aligned to the watermelon 97,103 reference genome v2.5 (Yu et al. 2023).

    Identification of candidate genes

    Pairwise linkage disequilibrium (LD) between SNPs within 1,000,000 bp upstream and downstream of each significant SNP was estimated in Vcftools (Danecek et al. 2011). Physical distance at which LD is half the maximum LD is declared as the LD block around each significant SNP (Vos et al. 2017), and this physical region was searched for potential candidate genes within the C. amarus (USVL246-FR2 v1) reference genome annotation (cucurbitgenomics.org) using the CuGenDBv2 (Yu et al. 2023). All genes involved in molecular host-pathogen interactions were considered potential candidate genes.

    Genomic predictions

    Genomic predictions were performed with genomic best linear unbiased prediction (gBLUP; Zhang et al. 2021) and ridge-regression best linear unbiased prediction (rrBLUP; Endelman 2011) in R software (R Core Team 2022). For k-fold cross-validation, 20% of the accessions were randomly chosen as the inference panel while the rest were included in the reference panel, thus indicating a five-fold cross-validation. Along with five-fold validation, leave-one-out validation was also performed with gBLUP, while only k-fold cross-validation was adopted for rrBLUP. Phenotype and marker data from the reference panel was used to obtain genomic prediction for each individual in the inference panel and was calculated using the following formula (Henderson 1984):

    μI=KIRKRR1μR

    where μI is the predicted genomic value of a genotype in the inference panel, KIR is the covariance matrix between the inference and reference panels, KRR1 is the inverse of variance-covariance matrix for all individuals in the reference panel, and μR is the vector of random additive genetic effects from multiple background quantitative trait loci (QTL) for individuals in the reference panel. Correlation between observed phenotype and predicted genomic value was recorded, and the average correlation over 1,000 iterations was reported as the genomic prediction accuracy.

    Genome-wide marker effects for disease severity of Fon race 2 were estimated with the rrBLUP model (Endelman 2011):

    y=μ+Zu+e

    where y is an N × 1 vector of BLUEs for Fon race 2 disease severity for each accession, μ is the overall mean, Z is N × M marker matrix, u is marker effect vector, and e is the residual vector. Genomic selection (GS) was implemented with five-fold cross validation, and genomic estimated breeding values (GEBV) of 20% of samples were predicted. Correlation between predicted GEBV and phenotype values over 1,000 iterations was reported as genomic prediction accuracy.

    Results

    Fon race 2 resistance

    Test and replication variances were homogenous when tested with the Bartlett test (P values were 0.497 and 0.843, respectively). Mean disease severity distribution of accessions across two tests was approximately normal with minimum and maximum disease severity of 1 and 4.8, respectively. Analysis of variance (ANOVA) of disease severity evaluated across two tests showed significant effects of genotype, test, and replication (Table 1). Accessions identified as resistant in earlier studies showed lower mean disease severity in the current test (Wechter et al. 2012). PI_482257, PI_482322, PI_532669, PI_244017, and PI_482331 were the top five accessions in terms of lowest mean disease severity, while PI_542118, PI_542123, USVL-114, PI_542114, and PI_512385 had the highest mean disease severity. Pearson’s correlation of accession disease severity means between test 1 and test 2 was 0.44. Broad sense heritability of Fon race 2 disease severity was 0.39.

    Table 1. Analysis of variance for different factors in Fusarium oxysporum f. sp. niveum (Fon) race 2 phenotyping of 120 Citrullus amarus genotypes evaluated growth chamber studies

    Genome-wide association

    Genome-wide association analyses were performed with disease severity BLUEs and 2,126,759 SNP markers. Model selection determined the optimal number of PCs to be 0 as judged by BIC. MLM is the standard model for identifying genome-wide marker-trait-associations (MTA). Three different models were used to perform GWAS (i.e., BLINK, FarmCPU and MLM). FarmCPU utilizes both a fixed effect model and a random effect model and uses them iteratively (Liu et al. 2016), while BLINK replaces the computationally intensive random effect model with BIC, reducing computing time and improving precision in identifying significant MTAs (Huang et al. 2019). No model identified significant markers using BLUEs or BLUPs obtained from a single test. However, using BLUEs obtained from disease severity across two tests, FarmCPU identified four quantitative trait nucleotides (QTN) on three different chromosomes as significantly associated with Fon race 2 resistance, while only one significant QTN was identified with BLINK (Fig. 1), and none of the SNP markers were significant with MLM. The four markers identified as significant with FarmCPU explained 60% of the variation in Fon race 2 resistance. The significant SNP on chromosome 1 (S1_4713019) explained 26.26% phenotypic variation, while the SNP on chromosome 5 (S5_664891) explained 25.9%. The remaining two significant SNPs on chromosomes 1 (S1_10650359) and 9 (S9_34333445) together explained 8% of variation in disease severity (Table 2). BLINK identified one single marker as significantly associated with Fon race 2 resistance on chromosome 10 (S10_10616298) that explained 26.58% of the phenotypic variation (Table 2). No significant associations were detected with MLM. This could be the combined result of MLM testing each marker association with phenotype, considering kinship among the accessions defined as genetic effects of individuals in the analysis, and small sample size hindering accurate and meaningful estimates of polygenic Fon race 2 resistance (Branham et al. 2017; Yu et al. 2006).

    Fig. 1.

    Fig. 1. A to C, Genome-wide Manhattan plots of the three models included in the genome-wide association studies (GWAS) analyses. The x-axis is the genomic position of the single nucleotide polymorphisms (SNPs) in the genome, and the y-axis is the negative log base 10 of the P values. The solid horizontal line shows the significance threshold at 5% with false discovery rate (FDR) correction.

    Download as PowerPoint

    Table 2. Details of the significant SNPs associated with resistance to Fusarium oxysporum f. sp. niveum (Fon) race 2 resistance across 120 Citrullus amarus genotypes evaluated in growth chamber studies

    Four of the five SNPs identified using FarmCPU and BLINK (except for S9_34333445) were significantly different for mean disease severity with the Welch t test. Mean disease severity of accessions with “TT” genotype at S9_34333445 loci on chromosome 9 was 2.829, while accessions with “CC” alleles was 2.09. Mean disease severity of accessions homozygous for the resistant alleles at the remaining loci ranged from 2.32 to 2.64, while homozygous susceptible means ranged from 3.12 to 4.16 (Fig. 2). Along with the allele effect plots, we looked for polymorphisms between the resistant accessions and susceptible diploid cultivars to determine the possibility of developing KASP markers for the significant SNPs identified. Within a 100-bp region of each significant SNP, there is at least one polymorphic SNP between the two C. amarus-resistant accessions and 14 susceptible C. lanatus cultivars (Table 3). Over 160 candidate genes were identified in the LD blocks of all significant SNPs, with several coding for proteins known to be involved in plant resistance to Fusarium spp. (Supplementary Table S1).

    Fig. 2.

    Fig. 2. Box plots of allelic effects of significant single nucleotide polymorphisms (SNPs) found in genome-wide association studies (GWAS) analyses along with their significant differences.

    Download as PowerPoint

    Table 3. Polymorphic SNPs between susceptible cultivated diploid and resistant accessions within 100-bp region of significant SNPs identified in this study

    Genomic predictions

    Genomic predictions were estimated with gBLUP and rrBLUP. Both leave-one-out and five-fold cross-validations were performed with gBLUP, while only five-fold cross-validation was performed with rrBLUP. With gBLUP, mean prediction accuracy of the reference population is 0.87, and the mean and maximum prediction accuracies of the inference population were 0.081 and 0.47, respectively (Supplementary Table S2), with a standard deviation of 0.014. Mean prediction accuracy with leave-one-out cross-validation was 0.48 (Supplementary Table S2). Minimum and maximum predictive abilities with rrBLUP were −0.61 and 0.73, respectively, with a mean of 0.088 (Supplementary Table S2) and 0.23 standard deviation.

    Discussion

    Wilt caused by Fon race 2 is arguably the most destructive soilborne disease in watermelon. Germplasm screenings have been performed by several researchers, and QTL associated with tolerance to Fon race 2 have been identified using biparental populations (Branham et al. 2017; Meru and McGregor 2016; Wechter et al. 2012). To our knowledge, this is the first attempt to dissect the genetic nature of Fon race 2 resistance in the full USDA watermelon C. amarus Plant Introduction Collection using GWAS. With more than 2,000,000 SNP markers from resequencing of the C. amarus collection, BLINK and FarmCPU models implemented in GAPIT identified five QTN on chromosomes 1, 5, 9, and 10 as significantly associated with Fon race 2 resistance. Resistance QTLs were previously identified on chromosomes 1, 5, 9, and 10 (Branham et al. 2017, 2020; Ren et al. 2015) with biparental populations. QTN identified in this study, while on the same chromosomes reported by other researchers, are at different physical locations, with distances ranging from 2.81 to 56.7 Mb from the previous QTL peaks (Fig. 3).

    Fig. 3.

    Fig. 3. A to C, Quantile-quantile plots of the three models included in the genome-wide association studies (GWAS) analyses.

    Download as PowerPoint

    MLM is the most widely used GWAS model for plant systems (Atwell et al. 2010) but was unable to identify any significant associations in this study. In recent years, several multilocus models have been developed with improved power to detect associations, while decreasing false positive and false negative rates (Huang et al. 2019). Two of these, BLINK and FarmCPU, were implemented here and were able to detect significant associations. These models iteratively incorporate associated markers as covariates with testing markers to eliminate confounding results due to cryptic relationships among the individuals. Associated markers are selected based on bin or on LD, thus controlling both false positives and false negatives (Huang et al. 2019; Liu et al. 2016). Between these two multilocus models, BLINK identified more associated SNPs than FarmCPU in simulation studies of flowering time in maize (Huang et al. 2019). However, in the current study, BLINK identified only one QTN on chromosome 10 which is 16.2 Mb away from a previously identified QTL region, while FarmCPU identified four QTNs which are 2.8 to 56.7 Mb away from previously identified QTL regions (Branham et al. 2017, 2020). While five significant MTAs were identified here, future KASP development efforts should focus on the two MTAs (on chromosomes 1 and 5) with significant allele effects as they would likely be more effective for MAS.

    Over 160 candidate genes were identified in the LD blocks of the significant SNPs with several coding for proteins known to be involved in plant resistance to Fusarium spp. Among the three candidate genes identified in the LD block of S1_4713019, one encodes for an aquaporin-like protein, which was overexpressed upon fungal infection and imparted resistance in a Fusarium-wheat pathosystem (Wang and Zhang 2021). Six candidate genes were identified in the LD block of SNP S1_10650359. Some of these code for HTH myb-type domain-containing proteins, expansin and glutathione s-transferase (GST), and all were over-expressed in susceptible genotypes upon inoculation with Fusarium spp. (Han et al. 2016; Zhang et al. 2022). Eighty-five candidate genes encoding known proteins were identified in the LD block of SNP S5_664891. Of these, some genes code for copper ion and phosphate-kinases which were overexpressed in Fusarium-resistant potato cultivars (Liu et al. 2020; Romeis 2001). Within the LD block of the significant SNP on chromosome 9, the Fusarium growth restricting protein 2S albumin was identified (Agizzio et al. 2003; Romeis 2001). Gene expression, transcriptomic, and proteomic analysis between closely related resistant and susceptible lines would be useful for future validation of candidate genes identified in the current study (Zhang et al. 2022).

    Accuracy of genomic predictions depends on the population size, the genetic nature of the trait, the number of markers employed and their distribution, and relatedness between training and testing populations (Lorenz et al. 2011). Heritability of Fon race 2 resistance in the current study was moderate, and the number of markers employed in genomic predictions were more than 2,00,000, which are abundant for the watermelon genome size (i.e., ∼425 Mb) and LD decay rate (Guo et al. 2019). Lower mean predictions could be explained in part by the fact that the population was composed of accessions collected around the world, and reference and inference populations in each iteration were more likely to be less related. Mean prediction accuracies/prediction abilities with five-fold cross validation were lower than for leave-one-out cross validation, though the number of markers were the same for both methods. Our results support the hypothesis that genomic predictions for Fon race 2 resistance in watermelon are substantially impacted by training population size (training set sizes were 96 and 119 for k-fold and leave-one-out cross validation, respectively) and should include more accessions to have meaningful prediction accuracies as in other crops (Heffner et al. 2011). Apart from the population size, a higher number of markers was shown to negatively impact prediction accuracies in rice (Sousa et al. 2019). A reduction of markers could improve genomic prediction accuracies in watermelon as in rice (Cerioli et al. 2022). Given the nature of Fon race 2 resistance, segregation distortion in interspecific crosses (Ren et al. 2012), multitrait genomic selection (agronomic traits along with the Fon race 2 resistance) could be a promising tool in developing sweet dessert watermelon cultivars (i.e., C. lanatus) with Fon race 2 resistance.

    The author(s) declare no conflict of interest.

    Literature Cited

    Funding: Funding was provided by the National Institute of Food and Agriculture (grant number 2020-51181-32139).

    The author(s) declare no conflict of interest.