TECHNICAL ADVANCEOpen Access icon OPENOpen Access license

Application of Target Enrichment Sequencing for Population Genetic Analyses of the Obligate Plant Pathogens Pseudoperonospora cubensis and P. humuli in Michigan

    Affiliations
    Authors and Affiliations
    • Julian C. Bello1
    • Mary K. Hausbeck1
    • Monique L. Sakalidis1 2
    1. 1Department of Plant, Soil and Microbial Sciences, Michigan State, University, East Lansing, MI 48824, U.S.A.
    2. 2Department of Forestry, Michigan State University, East Lansing, MI 48824, U.S.A.

    Published Online:https://doi.org/10.1094/MPMI-11-20-0329-TA

    Abstract

    Technological advances in genome sequencing have improved our ability to catalog genomic variation and have led to an expansion of the scope and scale of genetic studies over the past decade. Yet, for agronomically important plant pathogens such as the downy mildews (Peronosporaceae), the scale of genetic studies remains limited. This is, in part, due to the difficulties associated with maintaining obligate pathogens and the logistical constraints involved in the genotyping of these species (e.g., obtaining DNA of sufficient quantity and quality). To gain an evolutionary and ecological perspective of downy mildews, adaptable methods for the genotyping of their populations are required. Here, we describe a targeted enrichment (TE) protocol to genotype isolates from two Pseudoperonospora species (P. cubensis and P. humuli), using less than 50 ng of mixed pathogen and plant DNA for library preparation. We were able to enrich 830 target genes across 128 samples and identified 2,514 high-quality single nucleotide polymorphism (SNP) variants. Using these SNPs, we detected significant genetic differentiation (analysis of molecular variance [AMOVA], P = 0.01) between P. cubensis subpopulations from Cucurbita moschata (clade I) and Cucumis sativus (clade II) in the state of Michigan. No evidence of location-based differentiation was detected within the P. cubensis (clade II) subpopulation in Michigan. However, a significant effect of location on the genetic variation of the P. humuli subpopulation was detected in the state (AMOVA, P = 0.01). Mantel tests found evidence that the genetic distance among P. humuli samples was associated with the physical distance of the hop yards from which the samples were collected (P = 0.005). The differences in the distribution of genetic variation of the Michigan P. humuli and P. cubensis subpopulations suggest differences in the dispersal of these two species. The TE protocol described here provides an additional tool for genotyping obligate biotrophic plant pathogens and the execution of new genetic studies.

    Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.

    Downy mildew (DM) pathogens (family Peronosporaceae) cause foliar disease in several agronomically important plant species (Gent et al. 2009; Kanetis et al. 2014; Kunjeti et al. 2016; Lane et al. 2005; Rivera et al. 2016; Wallace and Quesada-Ocampo, 2017; Wong and Wilcox 2001). The group is comprised of at least 20 different genera representing a large portion of the described pathogenic species of oomycetes (Choi and Thines 2015; Thines 2014). All DM species are considered obligate biotrophs, growing only in association with living host tissue (Bourret et al. 2018). Over the past decade, several studies have provided key insights into DM biology, including virulence mechanisms (Baxter et al. 2010; Burkhardt and Day 2016; Purayannur et al. 2020; Savory et al. 2012a), host specificity (Choi and Thines 2015; Rivera et al. 2016; Summers et al. 2015; Wallace et al. 2020), and fungicide resistance (Blum et al. 2011; Gisi and Sierotzki 2008). Despite this progress, several unresolved research questions in ecology and evolution remain, many of which could be addressed with emerging genomic and genetic approaches.

    Recent technological advances in high-throughput sequencing have facilitated the genotyping of dozens of samples and the analysis of whole plant-pathogen genomes (Cui et al. 2019; Fletcher et al. 2019; Rahman et al. 2019; Rivera et al. 2016; Savory et al. 2012b; Sharma et al. 2015; Withers et al. 2016). The recent availability of genome-wide sequence information from multiple individuals of the same species has facilitated population genomic studies of several plant pathogens (Carleson et al. 2019; Gent et al. 2019; Grünwald et al. 2017; Tabima et al. 2018). Whole or reduced representation genome sequencing technologies have become increasingly popular to monitor genetic changes in plant-pathogen populations. However, these technologies have not been broadly used to study DM populations due to difficulties in obtaining sufficient amounts of DNA from single obligate biotrophic individuals (Milgroom 2015).

    Traditionally, other genotyping technologies that require significantly lower inputs of DNA, such as Sanger sequencing and microsatellites (simple sequence repeats [SSRs]), are used with DNA extracted from fresh and stored symptomatic tissue to study DM populations (Kitner et al. 2015; Naegele et al. 2016; Quesada-Ocampo et al. 2012; Rivera et al. 2016; Wallace and Quesada-Ocampo 2017; Wallace et al. 2020). However, these approaches are limited by the relatively low number of variants that can be obtained when compared with genome-wide sequencing options. DNA from symptomatic tissue can be used for sequencing with high-throughput sequencing technologies, but the low concentration of pathogen DNA and large amounts of exogenous material (e.g., plant and bacterial DNA) can add further expense and complexity to the sequencing and bioinformatic analysis (Stassen et al. 2012).

    The quality of high-throughput sequencing data depends on the quality and purity of the DNA sample, which means that the target pathogen is ideally present in higher amounts when compared with the host plant before DNA extraction (Jouet et al. 2019). A spore propagation approach using detached susceptible leaves can be used to propagate bulk amounts of DM sporangia from which additional pathogen DNA can be recovered (Gent et al. 2019; Summers et al. 2015; Thomas et al. 2017); however, multiple growing cycles (seven to 10 days per cycle) are needed, making this approach time-consuming and labor-intensive (Ali et al. 2011; Gent et al. 2019). Spore propagation is most successful when using freshly collected symptomatic leaves, but propagating sporangia from samples stored over long periods of time or under poor conditions (e.g., samples stored above −80°C) can be complicated. Additionally, sporangial propagation requires continuous maintenance of fresh sporangia on a highly susceptible host, which may bias the genetic composition of the populations under study due to the selection of genotypes by the host and the propagation conditions (Jones et al. 2014; Thomas et al. 2017). While sporangial propagation can provide large amounts of high-quality (HQ) DNA needed for sequencing, this approach is impractical for population studies requiring a large number of samples collected over time (i.e., years).

    Sequence capture methods may provide a solution for genotyping DM pathogens that excludes nontarget DNA while also facilitating high coverage sequencing of several target loci (Kozarewa et al. 2015; Lim and Braun 2016). These techniques use affinity probes (RNA/DNA) to isolate particular sequences of interest (called target regions) out of a larger pool of DNA fragments (DNA library). Sequence capture methods such as target enrichment (TE) have been used across a variety of genomic studies with model (Clark et al. 2011; Gnirke et al. 2009) and nonmodel organisms (Faircloth et al. 2015; McCormack et al. 2016; Starrett et al. 2017) and have also facilitated the study of museum specimens with low amounts of poor-quality DNA (Cruz-Dávalos et al. 2017). Additionally, sequence capture methods have been used to study the genetic variation of other plant pathogens, such as Phytophthora infestans (Lin et al. 2020), Phytophthora capsici (Thilliez et al. 2019), and the obligate biotroph Albugo candida (Jouet et al. 2019). These techniques could facilitate the study of DM pathogen populations using symptomatic tissue samples, alleviating the need for sporangial propagation and expanding the type and condition of the samples used.

    In this study, we evaluated the genotypic variation of two DM species of genus Pseudoperonospora using TE sequencing. P. cubensis and P. humuli infect cucurbits and hops, respectively, worldwide and are considered the most economically important species of the genus Pseudoperonospora (Choi et al. 2005; Mitchell et al. 2011). P. cubensis (clades I and II) causes foliar blight (DM) of cucumber (Cucumis sativus), melon (Cucumis melo), pumpkin (Cucurbita maxima), watermelon (Citrullus lanatus), and squash (Cucurbita moschata) (Summers et al. 2015; Thomas et al. 2017; Wallace et al. 2020). P. humuli negatively impacts hop (Humulus lupulus) cone yield (Gent et al. 2010). We describe an optimized TE procedure for sequencing DM and a bioinformatic pipeline for population genetic analyses using TE data. Evaluation of the structure, diversity, and reproduction of P. cubensis and P. humuli populations in Michigan was of particular interest (Fig. 1).

    Fig. 1.

    Fig. 1. Number of acres of Cucumis sativus and Humulus lupulus planted in Michigan (adapted from Neufeld 2017). A, C. sativus planted acreage, location, and number of samples collected in Michigan (n = 40). A total of 26 samples were collected outside of Michigan in Ontario Canada (n = 6) and the states of Florida (n = 2), Alabama (n = 1), North Carolina (n = 2), South Carolina (n = 2), California (n = 1), New York (n = 1), Wisconsin (n = 1), Ohio (n = 3), Iowa (n = 1), Indiana (n = 5), and Georgia (n = 1). B, H. lupulus planted, location, and number of samples collected from hop yards in Michigan (n = 33) and Oregon (n = 1). Hop yards with more than 800 plants (H. lupulus) are represented by stars, those with fewer are represented by circles. Hop yards sampled (A to G) are colored in red. C, Population assignment by geographic location of Pseudoperonospora spp. samples in the Midwest (Wisconsin, Indiana, Ohio, Michigan [Mi], and Ontario, Canada).

    Download as PowerPoint

    RESULTS

    Library sequencing and read mapping.

    The sequencing and alignment results of all libraries used in this study are summarized in Table 1. A total of 270 samples containing from 2 to 500 ng of DNA were used for library preparation and enrichment. However, we could not generate enough enrichment products from 110 libraries, including 51 libraries prepared from samples with less than 50 ng of DNA. The remaining libraries (n = 160) were enriched successfully and were prepared from samples containing from 2 to 500 ng of DNA; these libraries included 118 libraries of P. cubensis and 42 libraries of P. humuli that were sequenced using three lanes of HiSeq4000.

    Table 1. Sequencing and alignment results from libraries sequenced using target enrichment, low-coverage whole-genome sequencing (Lc-WGS), and WGS

    Table 2. Plant hosts and the location of the 128 samples of Pseudoperonospora spp. samples used for population analyses

    TE libraries.

    Approximately 800 million paired end reads (2 × 150 bp) were obtained from the sequencing of these libraries (162) generating approximately 10 million reads ± 9.2 million (standard deviation) per library. A range of 13 to 95% of the reads per sample aligned to the target reference genome of P. cubensis. The total number of reads obtained per library was influenced by the hybridization time of the libraries with the capture probes for enrichment and the number of samples sequenced per lane. The enrichment sets 1 and 2 (TES1 and TES2) contained approximately the same number of libraries (62 and 72); however, the libraries of TES2 that hybridized for 48 h generated almost two times more aligned reads (2.64 ± 2.6 million) than the libraries that hybridized for 24 h (1.34 ± 1.0 million) (Table 1). Five percent of the reads per library that did not align to the reference genome of P. cubensis aligned to the C. sativus reference genome.

    Low-coverage whole-genome sequencing (Lc-WGS) libraries.

    Seven libraries of P. cubensis and two libraries of P. humuli were sequenced without enrichment (Lc-WGS), using MiSeq, generating a total of 41 million sequencing reads (2 × 150 bp). Approximately 0.55 ± 0.37 million reads were obtained per library, but only 25% of them aligned to the reference genome of P. cubensis. Twenty-five percent of the remaining reads aligned to the C. sativus reference genome.

    WGS libraries.

    A total of 350 million paired-end reads (2 × 150 bp) were obtained from Bioproject PRJNA360426, which included WGS data of nine libraries of P. cubensis and one library of P. humuli. An average of 15.6 ± 6 million reads per sample were obtained after sequencing. Eighty-nine percent of the total reads aligned to the P. cubensis reference genome and approximately 2% of the reads per sample aligned to the reference genome of C. sativus.

    Genotyping and variant calling.

    Single nucleotide polymorphisms (SNPs) were retrieved from the sequencing reads generated from 160 enriched DNA libraries and the reads obtained from the 10 libraries sequenced using WGS (Bioproject PRJNA360426). Ninety percent of the enriched libraries were prepared from input DNA amounts between 2 to 100 ng (Fig. 2B). After mapping the reads to the target reference genome of P. cubensis, 5,214 biallelic SNPs were identified across 70% of the 170 samples. These SNPs were located within 170 of the regions targeted for enrichment. To retain the maximum number of SNPs, we excluded 42 of the samples prepared using TE, as they contained more than 40% missing data (24 of these were hybridized for 24 h, with the remaining libraries hybridized for 48 h) (Table 1). SNPs with more than 75% of the missing data across the entire data set were also excluded. This resulted in a total of 2,514 HQ SNPs with <7% missing data for downstream analysis. All HQ SNPs were located within 107 of the 830 genes targeted, 66% of the SNPs were found within secreted and effectors proteins. The remaining 34% of the SNPs were contained within esterases (16.3%), glucanasas (3%), lyases (3%), SSRs (4%), and other (7%) genes.

    Fig. 2.

    Fig. 2. Depth coverage of high-quality single nucleotide polymorphisms (SNPs) across libraries. A, Depth coverage of high-quality SNPs (log10) across libraries of Pseudoperonospora cubensis and P. humuli sequenced using target enrichment with 24 (TES2) and 48 h (TES3) of hybridization and whole-genome sequencing (WGS). B, Depth coverage of high-quality SNPs (log10) across libraries sequenced using target enrichment with different amounts of input DNA for library preparation. Only samples that were retained after quality filtering were included in the analysis. All the sample genotyped using WGS were retrieved from Bioproject PRJNA360426 published by Thomas et al. (2017).

    Download as PowerPoint

    The coverage depth (CD) across the 2,514 HQ SNPs was variable among samples and significant differences in the CD were observed between the libraries genotyped using TE and the libraries genotyped using WGS (Kruskal-Wallis P = 2.2e-16) (Fig. 2A). The enriched libraries with a hybridization time of 48 h (TES3) exhibited a significantly higher CD than the enriched libraries generated with 24 h of hybridization time (TES2) or the libraries genotyped using WGS (Kruskal-Wallis P = 2.2e-16) (Fig. 2A). The CD across all HQ SNPs averaged 262 ± 546, and the CD was 358 ± 546 for the libraries with hybridization times of 24 and 48 h, respectively (Table 1; Fig. 2A). Among the samples sequenced using WGS, the CD across HQ SNPs averaged 180 ± 116 (Table 1). No correlation was observed between the amount of input DNA used for library preparation and the CD across HQ SNPs (P > 0.05) (Fig. 2B).

    Characterization of technical error.

    Technical replicates were used to characterize the number of errors introduced into a sample during TE. Of the five samples submitted for enrichment, only one sample and its technical replicate were excluded after the quality filtering of the SNPs. Differences in the average coverage between technical replicates of the same sample were observed (Fig. 3A); however, only a small proportion of loci were different between replicates (Fig. 3B). On average, 18 SNPs (out of 2,514 total SNPs) differed between technical replicates of P. cubensis, suggesting an error rate of roughly 0.7% within the HQ variants. A total of 30 SNPs differed between the two replicates of P. humuli (one sample), suggesting a slightly higher error rate of 1.2% within the HQ variants.

    Fig. 3.

    Fig. 3. Characterization of sequencing error among technical replicates. A, Coverage distribution of the technical replicates sequenced. Box plots are filled with 2,514 high-quality single nucleotide polymorphisms (SNPs). Each sample is colored according to the enrichment set (TES). TES1 and TES2 were hybridized for 24 and 48 h, respectively. B, Genetic differentiation among technical replicates. An unweighted pair group method with arithmetic means tree was reconstructed using 2,514 SNPs. The genetic distance represents the number of SNPs that are different among samples. Pairwise distances among technical replicates are shown in the top left corner.

    Download as PowerPoint

    Genetic differentiation and population structure.

    In the genetic distance tree of HQ SNPs (2,514) all 128 samples genotyped fell into one of two clades (100% bootstrap support); either P. cubensis (94 samples) or P. humuli (34 samples) (Fig. 4). Within the P. cubensis clade, samples were either clade I or II of P. cubensis (Thomas et al. 2017). The majority of samples from Cucumis spp. aligned with clade II of P. cubensis, while the remaining samples collected from Cucurbita moschata and Citrullus lanatus aligned with clade I (Fig. 4).

    Fig. 4.

    Fig. 4. Genetic differentiation of Pseudoperonospora cubensis and P. humuli samples. An unweighted pair group method with arithmetic means tree was reconstructed using 2,514 single nucleotide polymorphism (SNP) variants (n = 128). Bootstrap support values are indicated above the branches. The genetic distance represents the proportion of loci that are different between samples. One asterisk (*) indicates samples previously classified as clade I members and two (**) indicate samples previously classified as clade II members. Technical replicates are enclosed in red squares.

    Download as PowerPoint

    Samples in the ordination plot formed three clusters corresponding to P. humuli, P. cubensis clade I, and P. cubensis clade II; these clusters were consistent with the stratification by species and within P. cubensis by host in the genetic distance tree (Fig. 5A). F-statistics (FST > 0.25) and analysis of molecular variance (AMOVA) detected significant genetic differentiation (P = 0.01) between pathogen species, supporting the clustering of the ordination plot (Table 3). Stratification by host was supported by a high level of genetic differentiation among samples collected from Cucumis spp. and samples collected from Cucurbita moschata (FST = 0.08) (Table 3). AMOVA revealed that 78% of the total genetic variance was significantly associated with differences between host within P. cubensis (P = 0.01) (Table 4).

    Fig. 5.

    Fig. 5. Ordination plots of Pseudoperonospora spp. based on 2,514 single nucleotide polymorphisms. A, Ordination plot of P. cubensis and P. humuli samples according to host species. All points represent samples collected in Michigan unless indicated otherwise. B, Ordination plot of P. cubensis samples from Cucumis sativus from 2007 to 2017 in the United States (n = 57). C, Ordination plot of P. humuli samples from hop yards of Michigan in 2017 and 2018 (n = 34).

    Download as PowerPoint

    Table 3. Pairwise F-statistics (FST) comparisons among Pseudoperonospora cubensis (clade I and II) and Pseudoperonospora humulia

    Table 4. Analysis of molecular variance for Pseudoperonospora spp. grouped by hosta

    All P. cubensis samples collected from C. sativus in the Midwest, including Michigan, and the reference sample from Florida clustered loosely together across all four quadrants of the ordination plot (Fig. 5B). However, the reference samples from North Carolina and California clustered separately from each other and were not contained within the 90% confidence ellipses of the other subpopulations (Fig. 5B). More than 70% of the samples from the subpopulations of central and east Michigan were contained within the 90% confidence ellipse of the Midwest-w/o-MI subpopulation (Fig. 5B); however, only 40% of the samples from west Michigan were contained within the ellipse of the of the Midwest-w/o-MI subpopulation. AMOVA did not support significant genetic differentiation among the subpopulations of P. cubensis from C. sativus in Michigan and the Midwest-w/o-MI subpopulation (P = 0.39). Less than 1% of the total genetic variance was associated with difference among subpopulations (or regions) (Tables 6 and 7).

    The clone correction of all P. cubensis samples collected from C. sativus and C. melo (n = 78) resulted in 34 multilocus genotypes (MLGs), 35% of which were detected in multiple subpopulations (Fig. 6A and B). MLGs 103 and 114 were detected in all the subpopulations of Michigan (east, west, and central) and the Midwest-w/o-MI subpopulation. MLGs 44, 55, 68, 82, 115, and 125 were detected in more than two subpopulations within Michigan. The Midwest-w/o-MI subpopulation also shared MLG 51 and MLG 54 with the subpopulation of east Michigan (Fig. 6A). Additionally, MLG 33 from Florida was also detected in the east Michigan subpopulation (Fig. 6A). Supporting these findings, the Mantel test did not detect a significant relationship between the genetic and physical distances of the P. cubensis samples collected from C. sativus in the Midwest (P = 0.454) (Fig. 7A).

    Fig. 6.

    Fig. 6. Frequency and geographic distribution of Pseudoperonospora cubensis and P. humuli multilocus genotypes (MLGs). A, Frequency of P. cubensis genotypes collected from Cucumis sativus in the Midwest. Unique genotypes are colored in gray. B, Frequency of P. humuli genotypes collected from Humulus lupulus in Michigan. Unique genotypes are colored in gray.

    Download as PowerPoint
    Fig. 7.

    Fig. 7. Relationship between genetic differences and geographic distances among samples. A, Pseudoperonospora cubensis samples from Cucumis sativus and B, P. humuli samples from Humulus lupulus in Michigan.

    Download as PowerPoint

    The level of genetic differentiation among P. humuli samples from Michigan and Oregon (Fst = 0.014) was higher when compared with the genetic differentiation detected among P. cubensis samples from C. sativus originating in the Midwest, California, Florida, and North Carolina (Fst = 0.0018). In the ordination plot of all the P. humuli samples, most samples from the subpopulation of west Michigan were loosely dispersed in the left quadrants, while most samples from the subpopulation of north Michigan (75%) were clustered tightly in the right quadrants (Fig. 5C). Only two samples from the north subpopulation were contained within the 90% confidence ellipse of the west subpopulation (Fig. 5C). The samples from central and east Michigan were either contained in the ellipses from the north or west subpopulations. The reference sample from Oregon was not contained within the 90% confidence ellipse of Michigan subpopulations (Fig. 5C). These patterns were supported by AMOVA, in which significant genetic differentiation between the subpopulations of north and west Michigan was detected (P = 0.01) (Table 9). A total of 7.1% of the genetic variance was significantly associated with differences among subpopulations and 9.3% of the variance was associated with differences among hop yards within a subpopulation (Table 9). This suggests that geographic region may have a significant effect on the structure of the P. humuli population in Michigan.

    Clone correction of the 34 P. humuli samples resulted in 23 MLGs, none of which were detected in two different regions (Fig. 6C and D). However, some MLGs were detected in at least two hop yards within the same region. MLGs 21 and 16 were detected in two hop yards sampled in north Michigan (Fig. 6C). The Mantel test supported the hypothesis that the genetic distance among P. humuli samples collected in Michigan was significantly associated (P = 0.0042) with the geographic distance among hop yards (Fig. 7B).

    Genotypic diversity.

    The highest level of genotypic diversity was observed among the samples of P. humuli followed by P. cubensis clade I and P. cubensis clade II (Table 5). Each species or clade consisted of different MLGs spread relatively evenly. This was reflected in the estimates of genotypic richness (eMLG = 9 to 9.25) and evenness (E5 > 0.8) (Table 5). The diversity indexes of Shannon-Wiener, Stoddart and Taylor, and Simpson were highest among P. cubensis clade II samples, followed by P. humuli samples, and then P. cubensis clade I samples (Table 5). The greatest expected heterozygosity or proportion of heterozygous genotypes expected was observed among P. humuli samples, followed by P. cubensis clade I samples and, then, P. cubensis clade II samples (Table 5).

    Table 5. Genotypic diversity estimates for Pseudoperonospora spp. samples grouped by host (clade)a

    Table 6. Analysis of molecular variance of Midwest subpopulations of Pseudoperonospora cubensis collected from Cucumis sativusa

    Table 7. Pairwise F-statistics comparisons among subpopulations of Pseudoperonospora cubensis collected from Cucumis sativus in the Midwesta

    The Michigan subpopulations and the Midwest-w/o-MI subpopulation of P. cubensis collected from C. sativus exhibited similarly low levels of expected heterozygosity and genotypic diversity (Table 8). Expected heterozygosity ranged between 0.178 and 0.197, while genotypic richness (eMLG) ranged between 8.91 to 9.7. The east Michigan subpopulation had the greatest expected heterozygosity (Hexp = 0.197), followed by Midwest-w/o-MI (Hexp = 0.190), west Michigan (Hexp = 0.189), and central Michigan subpopulations (Hexp = 0.178) (Table 8). The east Michigan subpopulation had the highest genotypic richness (eMLG), while west Michigan and the Midwest-w/o-MI subpopulations had larger sample sizes and lower numbers of MLGs (Table 8). The diversity indexes of Shannon-Wiener, Stoddart and Taylor, and Simpson were all highest for the east Michigan subpopulation, followed by the subpopulations of west Michigan, Midwest-w/o-MI, and then, central Michigan (lowest indices) (Table 8). The highest E5 was detected for the central Michigan subpopulation followed by the subpopulations of east Michigan, west Michigan, and Midwest-w/o-MI (lowest E5) (Table 8).

    Table 8. Genotypic diversity estimates and index of association of Pseudoperonospora cubensis subpopulations collected from Cucumis sativus in the Midwesta

    Table 9. Analysis of molecular variance of Michigan subpopulations of Pseudoperonospora humulia

    The genotypic diversity varied more widely across the P. humuli subpopulations of Michigan (eMLG = 5 to 12) compared with the variation detected among P. cubensis subpopulations (eMLG = 8.91 to 9.7) (Table 10). The expected heterozygosity among the subpopulations of P. humuli was higher and ranged between 0.206 to 0.220. The subpopulations of north (Hexp = 0.224) and east-central Michigan (Hexp = 0.220) had the greatest expected heterozygosity, followed by the subpopulation of west Michigan (Hexp = 0.206) (Table 10). The subpopulation of west Michigan had the highest genotypic richness, while the north Michigan subpopulation had the largest sample size and the most MLGs. The diversity indexes of Shannon-Wiener, Stoddart and Taylor, and Simpson were all highest for the north Michigan subpopulation, followed by the subpopulations of west and east-central Michigan (lowest indices) (Table 10). The highest E5 was detected in the subpopulations of west and east-central Michigan, followed by the north Michigan subpopulation (lowest E5) (Table 10).

    Table 10. Genotypic diversity estimates and index of association of Michigan subpopulations of Pseudoperonospora humulia

    Reproductive mode.

    To determine whether allelic variants were randomly associated as expected in populations with a sexual mode of reproduction (low linkage), we calculated the mean index of association (IA) across loci for each species in Michigan and compared with the estimated IA of simulated populations under strong linkage (100% linked loci), moderate linkage (75 and 50% linked loci), and low linkage (25% and 0 loci under linkage) (Fig. 8). Upon comparison, significant differences were observed between the mean IA of each species and the mean values estimated for the simulated populations (P. cubensis, P < 2.2e-16 and P. humuli, P < 2.2e-16). The IA mean value of the P. cubensis samples was situated between the IA mean values of the simulated data with 50 and 75% linkage (Fig. 8A). This indicated that populations of P. cubensis have a mixed mode of reproduction with a predominantly clonal phase. The IA mean value of the P. humuli samples was not significantly different from the simulated data with 25% linkage (Fig. 8B), suggesting that the populations of P. humuli may have a mixed mode of reproduction that could be predominantly sexual.

    Fig. 8.

    Fig. 8. Estimation of the degree of linkage disequilibrium within Pseudoperonospora cubensis and P. humuli of Michigan. A, Observed index of association (IA) distribution of P. cubensis compared with the IA distributions of 0, 25, 50, 75, and 100% linkage. B, Observed IA distribution of P. humuli compared with the IA distributions of 0, 25, 50, 75, and 100% linkage. Groupings based on the Kruskal-Wallis rank-sum test are noted by the letters over the boxplots, in which the P. cubensis and P. humuli population datasets are grouped with the simulated 50 and 25% linkage data, respectively.

    Download as PowerPoint

    DISCUSSION

    Technological advances in genome sequencing have accelerated the cataloging of genomic variation of plant pathogens; however, extracting large amounts of HQ DNA from obligate pathogens responsible for DM is challenging and hampers their genotyping using next-generation sequencing. We adapted a TE method that facilitated the sequencing of 712 genes annotated as virulence factors in the genome of P. cubensis. We used this method to sequence P. cubensis and P. humuli DNA extracted from sporangia collected from plant tissue with signs of the pathogen. This approach facilitated the genotyping of samples that contained very low amounts of pathogen DNA mixed with other environmental contaminants (i.e., plant DNA, bacteria DNA). After aligning the sequenced DNA, we identified 2,514 SNPs and resolved the population structure of P. cubensis and P. humuli in Michigan. An effect of location on the genetic variation of P. humuli was detected and the genetic distance among samples was associated with the physical distance of the hop yards. Evidence of location-based differentiation within Michigan was not detected for the P. cubensis population.

    By using affinity probes and several amplification cycles, our TE protocol was designed to provide high coverage sequencing of specific loci in the P. cubensis and P. humuli genome. This facilitated the sequencing of a low amount of target (pathogen) DNA from environmental samples containing a significant amount of contaminant DNA from plant tissues and other microorganisms. Sequencing DNA directly from environmental samples reduces time, labor, and DNA input that are required for other sequencing approaches, such as genotyping by sequencing (GBS) or WGS (Gent et al. 2019; Summers et al. 2015; Thomas et al. 2017). This is particularly advantageous for DM, as other sequencing approaches (GBS, WGS, Rad-Seq) rely on large amounts of HQ DNA that can only be obtained using a laborious propagation procedure (Gent et al. 2019; Thomas et al. 2017). Using TE, several samples with less than 50 ng of DNA were successfully genotyped. Most of the samples that failed were single leaf lesions containing few or no sporangia (P. cubensis had been previously confirmed via quantitative PCR [qPCR]). The TE protocol also enabled the genotyping of samples that were no longer viable for propagation, expanding the number of samples that could be genotyped, including those collected more than 10 years ago.

    Using a reduced amount of DNA (2 to 100× less), the samples sequenced after TE reached similar sequencing coverage when compared with the samples sequenced using a WGS approach, at the region’s target (250×). However, the enriched samples required six to seven times less space for sequencing (60 to 70 samples per lane of HiSeq) compared with the space used for the sequencing of libraries using WGS (10 samples per lane of HiSeq). This was possible because TE reduced the complexity of the P. cubensis genome from 88.22 MB to <1 MB (i.e., 1.13% of the genome). This reduction in the genome complexity and the enrichment of up to four samples in a single enrichment experiment (MyBaits) make TE a cost-effective alternative to WGS.

    However, we found that our TE protocol can result in a higher number of sequencing errors per sample, as compared with WGS. The high number of amplification cycles used for the enrichment of samples with low amounts of DNA may have led to the introduction of PCR errors and a subsequent reduction in genotyping accuracy. Typically, high-throughput sequencing (e.g., WGS and GBS) without enrichment of samples with low amounts of DNA (<250 ng) requires an average of 14 PCR cycles, however, six to 14 extra cycles were used after library construction in our TE protocol (for a total of 20 to 28 cycles). Thus, possibly due to the high number of extra amplification cycles, the samples genotyped using TE were subject to an error rate between 0.7 and 1.2%, an error rate 10 times higher than the estimated error rate of >0.1% for high-throughput sequencing data (Grünwald et al. 2017; Ma et al. 2019). We calculated this error rate using only three and one technical replicate from two different species; this is far from ideal as sequencing error rates can show sample-specific effects and sequence context (genome) dependency (Ma et al. 2019). Therefore, a higher number of technical replicates (approximately three technical replicates of 10% of the total number of samples) is recommended for future studies to more precisely assess the genotyping reproducibility of TE. In this study, despite an error rate of about 1%, the genetic distance between samples in 95% of cases was greater than the distance generated due to PCR errors.

    Using TE, we detected genetic patterns linked to pathogen biology. For example, a significant effect of host in the population structure of P. cubensis was detected, and the samples collected from C. sativus were genetically different from the samples collected from Cucurbita moschata. These results support previous studies that have shown that the structure of the P. cubensis population is driven by host preference, with samples from Cucumis spp. and Cucurbita moschata belonging to two distinct evolutionary clades (Thomas et al. 2017; Wallace et al. 2020). Thomas et al. (2017) identified two P. cubensis clades in the United States; clade I occurs on Cucurbita pepo, Cucurbita moschata, Cucurbita maxima, and Citrullus lanatus, and clade II occurs on Cucumis spp. In our analyses, all P. cubensis samples clustered by clades according to that distribution.

    Annual cucurbit downy mildew (CDM) infections in the northern United States are driven by an influx of airborne P. cubensis sporangia from overwintering sites (Bello et al. 2020; Goldenhar and Hausbeck 2019; Naegele et al. 2016). CDM limits cucumber yield in Michigan, where more than 37,000 acres of cucumbers are planted every year (United States Department of Agriculture 2020). Small, yet significant genetic differences among cucumber production regions in Michigan were previously reported (Naegele et al. 2016). However, a significant effect of location in the subpopulation structure of P. cubensis was not detected in the current study, suggesting no differentiation within and among regions in Michigan. An exchange of migrants has a homogenizing effect on subpopulations (Milgroom 2015) and may occur in Michigan due to the availability of susceptible crops across the state. The detection of the same MLG in multiple locations supports this hypothesis and also suggests that P. cubensis sporangia may be disseminated unrestricted in the state, as there is no geographical barrier (e.g., mountain range or water body) that may limit the homogenizing of geographically distant populations. The wide host availability in Michigan may also facilitate the establishment of incoming MLG from other subpopulations in the Midwest. The exchange of migrants may occur among Midwestern states outside of Michigan, but the absence of susceptible crops between geographically distant populations may result in lower rates of exchange and, subsequently, more genetically differentiated populations. Additional sampling is needed to test this hypothesis.

    Significant differences were detected between the P. humuli populations from the north and west regions of Michigan, and we did not find any evidence of genetic differentiation among hop yards within the same region. In Michigan in 2018, approximately 990 acres of hops were planted across more than 50 commercial hop yards (Michigan Department of Agriculture & Rural Development, Lansing, MI, U.S.A.) and, despite the potential for airborne dispersal of hop DM (Bello et al. 2020; Gent et al. 2009), none of the 12 MLGs detected in the north region (n = 20) were detected in west (n = 8) or central (n = 5) Michigan. This is consistent with the restricted pattern of dispersal suggested for this pathogen; new infections of hop plants by P. humuli are less likely to occur far from their inoculum source (Johnson et al. 1991). However, given the limited sample size per subpopulation, these results may be an artifact, and additional samples are needed to corroborate the patterns of genetic distribution observed in this study. Similarly, additional samples are also needed to corroborate the significant correlation detected between the genetic and geographic distance of P. humuli samples that suggest a higher probability for new P. humuli infections to occur close to their inoculum source.

    In Oregon, Gent et al. (2019) found significant genetic differences between hop yards planted within 10 km of each other and a low amount of population differentiation between hop yards established from the same planting material. This suggests that infected hop plant material may be a more important source of primary inoculum than airborne migration for DM. Most of the hop yards sampled for this study were among the first yards establish from rhizomes in the mid-2000s, when commercial hop production returned to Michigan, and very few sources of propagation material were available (Sirrine et al. 2014). As the hop industry matured in the state, new sources of propagation material became available, but still in limited numbers. The lack of genetic differentiation among hop yards within regions in Michigan may be a result of the introduction of a reduced number of genotypes via planting material from the same origin source into multiple hop yards. The detection of the same MLGs in multiple hop yards and relatively low genetic diversity of north Michigan partially support this hypothesis, but further sampling and more information on sources of plant material are needed to verify this hypothesis in Michigan.

    The restricted airborne dispersal of P. humuli sporangia compared with P. cubensis could be attributed to the cultivation practices and geographic distribution of hops. Generally, diseased basal shoots are close to the ground and during most of the season are sheltered within a canopy of healthy basal shoots that hamper dispersal (Johnson et al. 1991). Additionally, the area planted with hops in the state is equivalent to only 2% of the area planted with cucumbers. This lower host availability may result in reduced P. humuli sporangia production compared with P. cubensis and lower aerial exchange of MLG among subpopulations. The restricted exchange of MLG can result in genetically differentiated subpopulations (Milgroom 2015).

    The relatively low level of genetic differentiation and genetic diversity detected within the Pseudoperonospora spp. populations of Michigan is consistent with the clonal reproductive mode of P. cubensis and inbreeding reproductive mode of P. humuli (Gent et al. 2019; Naegele et al. 2016). Although we estimated relatively low IAs (65%) for both species, suggesting they experience a sexual phase, especially P. humuli, we believe these values are an artifact of our sampling strategy. Our samples were collected mainly from individual leaf lesions (cucurbits) or bulked inoculum of diseased shoots (hops), so it is likely that most of our P. humuli sample units may contain multiple genotypes creating an effect of random mating (no linkage disequilibrium). An earlier study of P. humuli populations in the Pacific Northwest that used a sampling strategy similar to that in our study suggested that the P. humuli population of Oregon was reproducing sexually (Chee et al. 2006). However, in a genetic study of P. humuli using more precise sample units in the same region, GBS revealed strong evidence of linkage disequilibrium (Gent et al. 2019). This is consistent with the nonrandom mating expected from species with an asexually reproducing or highly inbred mode (Gent et al. 2019). Similarly, Wallace et al. (2020) provided evidence of nonrandom mating or recombination consistent with selfing or asexual reproduction for P. cubensis clade II. A strong signal of nonrandom mating for P. cubensis clade II was also detected in this study.

    The use of more precise sample units (e.g., individual leaf lesions or single cells) and the sequencing of selectively neutral regions should be considered for studies focusing on capturing fine population structure (Grünwald et al. 2016). We included 94 neutral regions (genes) containing small SSR markers among the regions targeted for sequencing. However, the sequencing of these regions was of poor quality and we could not detect SSRs within them. Instead, we found 115 SNPs within these regions that were not significantly differentiated among or within subpopulations (data not shown). Most of the signal of population differentiation was found within 66% of the HQ SNPs used; these SNPs were contained within secreted proteins or effector genes that corresponded to 10% of the genes targeted. The SNPs contained within these regions are likely to be under strong selective pressure and may be acceptable to describe genetic differentiation among populations; however, these markers may limit the detection of ancient divergence among populations. We encourage the use of neutral polymorphic regions more evenly distributed across the genome for future genetic studies using TE, as they may provide more accurate estimates of population differentiation at multiple temporal and spatial scales (Milgroom 2015).

    We also recommend the use of a higher number of baits for enrichment when working with low concentrations of target DNA and the quantification of plant and pathogen DNA, using qPCR before library preparation. More baits should increase the sequencing coverage across samples, facilitating the calling of variants. Similarly, the utilization of samples with a higher amount of pathogen DNA relative to the amount of contaminating DNA could increase the chances for the successful preparation of libraries using TE. Future studies should also include more technical replicates to assess error due to batch effects, as the high number of amplification cycles resulted in lower genotyping accuracy. The introduction of technical replicates from multiple generations of clones is also advised. This is critical when working with high-throughput sequencing data, due to the introduction of false mutations in the data that could create additional MLGs (Gent et al. 2019; Potapov and Ong 2017).

    In summary, our results reveal the key strengths of TE for the genotyping of DM. This approach provides a solution for the genotyping of obligate biotrophic pathogens, for which high-throughput sequencing is typically constrained by the low amounts of target DNA and high amounts of nontarget (contaminating) DNA. TE provides a cost-effective approach for the genotyping of unpurified field samples and the assessment of sequence polymorphisms across a large number of individuals. This method could be adapted to a diverse group of pathogens, even without a reference genome. Future population studies using TE should carefully consider the sampling strategy and the size and complexity of the genomic regions targeted. Including technical replicates will also be important to ensure the accurate genotyping of the samples after enrichment and the reproducibility of the experiments.

    MATERIALS AND METHODS

    Sample collection and DNA extraction.

    Over a total of 12 years (2007 to 2009, 2012 to 2013, 2015, and 2018 to 2019), P. cubensis sporangia were obtained from symptomatic cucurbit tissue collected in Michigan, other states (Indiana, Iowa, Ohio, Florida, and Wisconsin), and one Canadian province, Ontario (Fig. 1A; Supplementary Table S1). Approximately 85% of P. cubensis samples were processed by excising single lesions using a sterile scalpel. Sporangia from each lesion were harvested by vigorously shaking a 2-ml microcentrifuge tube containing the tissue and 1 ml of distilled H2O for 30 s. Dislodged sporangia were pelleted (14,000 rpm for 5 min; 5424 Centrifuge, Eppendorf) and were subjected to DNA extraction. The remaining samples were processed by gently rinsing sporangia from infected leaves with multiple lesions, using a Preval spray power unit (Preval, Chicago) filled with distilled water as described by Mitchell et al. (2011). Dislodged sporangia were transferred into a 2-ml centrifuge tube, were pelleted (14,000 rpm for 5 min; 5424 Centrifuge, Eppendorf), and were subjected to DNA extraction.

    P. humuli samples were processed using methods similar to those described by Chee et al. (2006). Sporangial suspensions were obtained from diseased basal shoots collected from six commercial hop yards and a hop research plot located in Michigan. The three hop yards in the north region were located within a radius of approximately 25 km (hop yards A, B, and C) (Fig. 1B). The two hop yards in the west region of the state were separated by approximately 100 km and were operated by the same producer (hop yards D and E) (Fig. 1B). The sixth commercial hop yard was located in the east region (hop yard F) and was located approximately 80 km from the research plot (hop yard G) located in the central region (Fig. 1B). Diseased basal shoots were brought to the laboratory, stems were placed into beakers of water, and shoots were individually covered with a plastic bag overnight, to induce sporulation. The following day, sporangia were washed from the abaxial leaf surface using a Preval spray power unit filled with distilled water. Sporangial suspensions from single shoots were transferred into 2-ml centrifuge tubes and were pelleted by centrifugation (14,000 rpm for 5 min). All DNA extractions were performed on the pelleted sporangial suspensions using the NucleoSpin Plant II isolation kit (Macherey-Nagel), following manufacturer instructions.

    DNA library construction.

    TE libraries.

    Whole DNA was extracted from 305 samples as described above and was quantified using a Qubit 2.0 fluorimeter (Thermo Fisher Scientific). Before fragmentation of DNA, using a M220 Focused-ultrasonicator (Covaris), the presence of P. cubensis and P. humuli in cucurbit or hop samples was confirmed by Sanger sequencing of the intergenic transcribed spacer ribosomal DNA region, as described by Quesada-Ocampo et al. (2012). A subset of five samples was divided into two individual tubes and was submitted independently for library preparation and enrichment. These samples were used to estimate the number of errors introduced into a sample during the TE workflow. Sequencing libraries were prepared using the KAPA HyperPrep library sequencing preparation kit (Kapa/Roche), following manufacturer recommendations, and an additional KAPA Pure Bead (Kapa/Roche) cleanup to ensure the libraries were free from any adapters. Insert size and library quality was verified using the Agilent 2100 Bioanalyzer (Agilent Technologies). To optimize the use of the MYbaits custom kit reactions (80-bp baits; Arbor Biosciences, formerly MYcroarray Inc.), good quality libraries were pooled into groups of four libraries for enrichment. Probes (baits) were designed according to manufacturer protocols (probe compatibility, repeat masking, and melting temperature filters) to cover 118 population-informative DNA regions and 712 DNA regions annotated as protein genes at a 2× coverage (GitHub database). Protein gene sequences were obtained using bedtools 2.26.0 (Quinlan and Hall 2010), with coordinate information obtained from the genome annotation of P. cubensis available online from Oregon State University Libraries (Burkhardt et al. 2015). All genes annotated as extracellular toxins, hydrolytic enzymes, enzyme inhibitors, cell-entering RxLR effectors, cell-entering crinkler effectors, secreted proteins, and transcription factors were targeted for sequencing. These regions were selected to identify annotated genes containing SNPs that may play a role in controlling host-specificity and selectively neutral regions for population level analyses (Kitner et al. 2015; Quesada-Ocampo et al. 2012; Wallace and Quesada-Ocampo 2017).

    Hybridization of DNA libraries and probes were performed at 65°C for 24 or 48 h (Supplementary Table S1). After hybridization, library pools were bound to Dynabeads MyOne Streptavidin C1 magnetic beads (Life Technologies) for enrichment. Amplification of the enriched libraries was performed using the 2× KAPA HiFi HotStart ReadyMix (Kapa/Roche) and 5 μM of each TruSeq forward and reverse primer. Amplification conditions were 98°C for 2 min, followed by six to 14 cycles of 98°C for 20 s, 60°C for 30 s, and 72°C for 60 s, and, then, a final extension of 72°C for 5 min. To retain libraries with at least 0.5 ng of DNA with fragments narrowly distributed around 300 bp, the libraries were quantified using a Qubit 2.0 fluorimeter (Invitrogen) and were analyzed using a Caliper LabChipGX HS DNA assay (PerkinElmer) after PCR. Based on these quantifications, the libraries were combined in equimolar amounts into three groups for sequencing. The first set of samples (TES1) included 62 enriched libraries with a hybridization time of 24 h. The second set (TES2) of samples included 72 enriched libraries with a hybridization time of 48 h (Supplementary Table S1). The third set (TES3) of samples included 24 enriched libraries with a hybridization time of 48 h (Supplementary Table S1). All sets were quantified using the Qubit 2.0 and the Kapa Biosystems Illumina library quantification qPCR kit. Each set was loaded onto one lane of an Illumina HiSeq 4000 flow cell and was sequenced in a 2 × 150 bp paired-end format.

    Lc-WGS libraries.

    In order to assess the performance of TE in comparison with Lc-WGS and to estimate the amount of exogenous material in the DNA samples, 10 samples of genomic DNA directly extracted from sporangial suspensions derived from individual lesions were submitted to the Michigan State University Research Technology Support Facility for DNA library preparation and sequencing. Due to the low DNA concentration of some of the samples, the Rubicon ThruPLEX DNA-Seq library preparation kit (Takara Bio) was used to prepare the sequencing libraries. Library preparation was performed following manufacturer recommendations with an additional AMPureXP bead cleanup (Beckman Coulter). To retain libraries with at least 0.5 ng of DNA with fragments narrowly distributed around 300 bp, the libraries were quantified using a Qubit 2.0 fluorimeter (Invitrogen) and were analyzed using a Caliper LabChipGX HS DNA assay (PerkinElmer) after PCR. Based on these quantifications, the libraries were grouped in equimolar amounts into two pools that were quantified using the Kapa Biosystems Illumina library quantification qPCR kit (Kapa/Roche). Each pool was loaded into one lane of an Illumina MiSeq standard flow cell (v2) (Illumina) and sequencing was performed in a 2 × 150 bp paired-end format, using a v2 500 cycle MiSeq reagent cartridge (Illumina).

    Genotyping and variant calling.

    In order to assess the performance of TE for the genotyping of P. cubensis and P. humuli, we analyzed the data in combination with previously published WGS data generated for a comparative genomic analyses of P. cubensis. This data set included one P. humuli sample collected from hop in Oregon and nine P. cubensis samples collected from different cucurbits in Alabama, South Carolina (n = 2), Florida, North Carolina (n = 2), Georgia, California, and New York. The datasets can be found in Bioprojects PRJNA360426 (Thomas et al. 2017) and PRJNA675756 (this project), available online through the National Center for Biotechnology Information. Sequencing raw reads of all libraries were trimmed and quality was filtered, using Trimmomatic 0.32 (Bolger et al. 2014). The publicly available P. cubensis (Savory et al. 2012b) and C. sativus genome sequences were used as references for alignment of the reads. To estimate plant contamination, HQ reads were initially aligned to the C. sativus genome, using the Burrows-Wheeler aligner (BWA-0.7.17) (Li and Durbin 2009). All reads including those that aligned to the reference genome of C. sativus were then aligned to a reference genome of P. cubensis containing only the contigs in which the target genes are located, hereafter referred to as the target reference genome of P. cubensis. Clean reads of Bioproject PRJNA360426 were also aligned to this subset of the genome. Samtools-0.1.19 (Li et al. 2009) was used to sort the alignments and PCR duplicates were removed with Picard. Coverage depth was calculated using BEDTools-2.27.1.

    The next generation sequencing plugin (NGSEP) for analysis of high-throughput sequencing data was used for calling SNPs and producing the variant call format file (VCF) (Perea et al. 2016). The VCF file generated was filtered, using the NGSEP plugin, in order to retain the most informative SNPs of sufficient quality. Only biallelic SNPs, present in 70% of samples, with a minimum quality of 40 were retained. Indels and copy number variants were excluded to facilitate the analysis. The resulting VCF file was imported into R using the package vcfR (version 1.5.0) (Knaus and Grünwald 2017). Missing genotype data were deleted before further analysis (Tabima et al. 2018). Samples with more than 40% of missing variants were removed from the analysis to retain the maximum number of SNPs. Variants with more than 75% of missing information across samples were also removed. To increase the stringency for the genotyping, we excluded all the samples with an average coverage below 50×. A nonparametric Kruskal-Wallis rank-sum test was used to assess the differences among the mean coverage depth across libraries.

    Population assignment.

    P. cubensis samples were assigned to several different subpopulations depending on the analyses. To study the distribution of the genetic variation by host, P. cubensis samples were assigned to subpopulations according to host species. To study the distribution of the genetic variation by geographic location, only the P. cubensis samples collected from Cucumis spp. were used. The samples collected in the states of Michigan, Wisconsin, Indiana, and Ohio and the Canadian province of Ontario were assigned to the Midwest subpopulation (Fig. 1C). Midwest samples excluding those from Michigan were assigned to the Midwest-w/o-MI subpopulation (Fig. 1C). Samples from Michigan were further divided into three subpopulations according to the region of collection (east, west, and central) (Fig. 1C). Samples collected from Iowa, Alabama, Oregon, California, South Carolina, Florida, New York, North Carolina, and Georgia were assigned to their respective state of origin.

    P. humuli samples were assigned to either the north or west subpopulation based on their location of collection in Michigan (Fig. 1B and C; Table 2; Supplementary Table S1). Three hop yards (A, B, and C) were located within a radius of approximately 25 km and comprised the north subpopulation (Fig. 1C). Hop yards D and E in southwest Michigan are operated by one producer and make up the west subpopulation (Fig. 1C). Representative samples from central and east Michigan and Oregon were also included in our analysis.

    Genetic differentiation and population structure.

    The filtered SNPs were used to construct a genetic tree based on Nei’s genetic distance (Nei 1972) and, to visualize the genetic relationship among all samples; 1,000 bootstrap replicates were performed to obtain branch support (Kamvar et al. 2015). Clone-correction was performed by collapsing samples to the average genetic distance detected between technical replicates of sequencing. For this, the “mlg.filter” function of the R package Poppr (version 2.4.1) was used with a bitwise distance equal to 0.02 and the farthest neighbor algorithm. Ordination plots based on principal component analysis (PCA) were also constructed to visualize the differentiation of samples within and among subpopulations from different host or regions across the Midwest (Fig. 1C). Three distinct PCAs were constructed to visualize the relationship among i) all P. cubensis and P. humuli samples by host, ii) all P. cubensis samples collected from C. sativus, and iii) all P. humuli samples collected in Michigan and Oregon. The PCAs were performed using the “glPca” function of R package Adegenet (Jombart and Ahmed 2011), and the ordination plots were constructed using the R package Ggplot2. Samples were colored according to host or subpopulations, and ellipses of 90% confidence intervals were drawn on the ordination plots.

    Genetic differentiation between and within subpopulations was quantified using hierarchical FST (hierfstat version 0.04.22) (Goudet 2005) and AMOVA (poppr version 2.4.1) (Kamvar et al. 2014) in R version 3.6.1. To test the hypothesis that the Pseudoperonospora spp. populations are differentiated by host, we performed an AMOVA and calculated FST statistics using all the samples collected by P. cubensis and P. humuli grouped by host. Similarly, to test the hypothesis that the P. cubensis subpopulation from C. sativus in Michigan is differentiated by geographic location, we performed an AMOVA and calculated FST statistics using the samples from the Michigan subpopulations (east, west, and central) and the Midwest-w/o-Mi subpopulation. Population differentiation among the P. humuli subpopulations of Michigan (north and west) was analyzed in the same way. A Mantel test (Mantel 1967) was performed to evaluate the relationship between the genetic distance of the samples and the physical distance of the location where the samples originated. This analysis was performed for each species independently (P. cubensis and P. humuli), using only the samples collected in Michigan. The Mantel test was performed as described by Gent et al. (2019), using the “mantel.rtest “function of the ade4 package (Dray and Dufour 2007). The genetic and Euclidean-geographic distance matrices were created with the function “bitwise.dist” (Kamvar et al. 2014) and “dist”, respectively. Physical distances were calculated using the coordinates of each sample location. All scripts and functions for SNP filtering and population analyses are deposited in the GitHub database. These scripts are annotated to provide more in-depth information for how quality control was performed.

    Genotypic diversity.

    Genotypic diversity was determined using the filtered SNP data (not clone-corrected) using the R package Poppr (Kamvar et al. 2014). The diversity estimates calculated included genotypic richness (MLG), eMLG based on rarefaction, E5, and the Shannon-Wiener (Shannon 1948), Stoddart and Taylor (Stoddart and Taylor 1988), and Simpson (Simpson 1949) indices. The within-population gene diversity (Hexp) (Nei 1972) that describes the proportion of heterozygous genotypes was calculated using the function “basic.stats” of the R package hierfstat. The diversity estimates were calculated using i) all P. cubensis and P. humuli samples grouped by host, ii) all P. cubensis samples collected from C. sativus in each Michigan subpopulation and the Midwest-w/o-Michigan subpopulation, and iii) the P. humuli samples from the north and west subpopulations of Michigan. Only categories or subpopulations with at least five individuals were included for this analysis.

    Reproductive mode.

    To infer the predominant reproductive mode of P. cubensis and P. humuli in Michigan (e.g., sexual, clonal, or mixed), we calculated the IA (Brown et al. 1980; Milgroom 1996) according to Tabima et al. (2018), using only the P. cubensis and P. humuli samples collected in the state from C. sativus and H. lupulus, respectively. For this, we calculated the IA value, using a subset of 1,000 random HQ SNPs and compared it with the observed values of simulated populations with 0, 25, 50, 75, and 100% linkage. The mean IA values of each species were compared independently against the mean IA values of the simulated populations. Simulations were conducted with a dataset consisting of 2,514 loci analogous to the observed P. cubensis and P. humuli data. A nonparametric Kruskal-Wallis rank-sum test was used to assess the differences among means for all IA distributions. Multiple comparisons between IA distribution were performed using the nonparametric Kruskal- Wallis test and Tukey’s honest significant difference test.

    ACKNOWLEDGMENTS

    We thank the following undergraduate research assistants: J. Chan and A. Job. We also thank D. Higgins for helpful discussions.

    AUTHOR-RECOMMENDED INTERNET RESOURCES

    GitHub database: https://github.com/jcbello131/TE_Pseudoperonospora

    Oregon State University Libraries dataset: https://dx.doi.org/10.7267/N9TD9V7M

    Picard: https://broadinstitute.github.io/picard

    The author(s) declare no conflict of interest.

    LITERATURE CITED

    Funding: This project was supported, in part, by funds from the National Institute of Food and Agriculture, U.S. Department of Agriculture, under award number 2015-51181-24285, the Agricultural Research Fund of Pickle Packers International, Inc., and the Michigan State University Project GREEEN GR15-020.

    The author(s) declare no conflict of interest.