RESOURCE ANNOUNCEMENTOpen Access icon OPENOpen Access license

A Chromosome Scale Assembly of an Australian Puccinia striiformis f. sp. tritici Isolate of the PstS1 Lineage

    Affiliations
    Authors and Affiliations
    • Benjamin Schwessinger1
    • Ashley Jones1
    • Mustafa Albekaa1
    • Yiheng Hu1 2
    • Amy Mackenzie1 3
    • Rita Tam1
    • Ramawatar Nagar1 4
    • Andrew Milgate5
    • John P. Rathjen1
    • Sambasivam Periyannan1 3
    1. 1Research School of Biology, The Australian National University, Canberra, Australia
    2. 2Department of Microbial Interactions, IMIT/ZMBP, University of Tübingen, Tübingen, Germany
    3. 3CSIRO Agriculture and Food, Canberra, Australia
    4. 4National Institute for Plant Biotechnology, Indian Council of Agricultural Research, New Delhi, India
    5. 5New South Wales Department of Primary Industries, Wagga Wagga Agricultural Institute, Wagga Wagga, NSW, Australia

    Published Online:https://doi.org/10.1094/MPMI-09-21-0236-A

    Genome Announcement

    Puccinia striiformis f. sp. tritici is a globally important pathogen of wheat. P. striiformis f. sp. tritici causes wheat stripe rust (or yellow rust) disease, leading to global losses estimated at $1 billion annually. Stripe rust disease threatens global wheat cultivation and food security (Schwessinger 2017; Wellings 2011). In Australia, pathotypes belonging to 134 E16 A+ (PstS1) lineages have dominated wheat fields for nearly two decades, since their first detection in 2002 (Ding et al. 2021). Introduced accidentally from North America through international travel, the lineage spread rapidly from the port of entry in Western Australia to eastern states within a year, completely replacing the predecessor lineage 104 E137 A−. The P. striiformis f. sp. tritici 134 E16 A+ lineage evolved stepwise through the generation of derivative pathotypes with additional virulence to the Yr10, Yr17, Yr24, and Yr27 resistance genes deployed widely in commercial wheat varieties. Subsequently, some of these mutant pathotypes adapted to commercial triticale varieties through virulence to the YrJ and YrT genes present in these wheat × rye hybrids (Ding et al. 2021). Despite the dominance of this lineage over 20 years, we lack knowledge of its genome content and architecture. Here, we announce the genome of a recent derivative of 134 E16 A+ named 134 E16 A+ 17+ 33+, collected from Wagga Wagga, New South Wales, Australia, during 2017. The new isolate has virulence to the Yr2, Yr6, Yr7, Yr8, Yr9, Yr17, Yr25, Yr33, and YrA genes but is avirulent to Yr1, Yr3, Yr4, Yr5, Yr10, Yr15, Yr24, Yr26, Yr27, Yr32, YrJ, YrT, and YrSp.

    We extracted high-molecular weight DNA from freshly harvested uredinospores using a modified cetrimonium bromide (cetyltrimethylammonium bromide) extraction protocol based on Arseneau et al. (2017), which is described in detail on protocols.io (Jones 2021). The extracted DNA underwent size selection for high-molecular weight fragments ≥20 kb using a PippinHT (Sage Science), prior to sequencing with the Oxford Nanopore Technologies MinION platform. A long-read native DNA sequencing library was prepared according to the manufacturer’s protocol 1D genomic DNA by ligation (SQK-LSK109), and sequencing was performed on a MinION Mk1B using a FLO-MIN106D R9.4.1 flow cell, according to the manufacturer’s instructions. The fast5 reads were base called to fastq with Guppy version 4.0.11 (high accuracy). The DNA was also used to generate a short-read library, for highly accurate sequencing on the Illumina NovaSeq 6000 platform (150-bp paired-end reads) (Jones 2018). These datasets were complemented with HiC 3D conformation capture sequencing of in vivo crosslinked chromatin of ungerminated uredinospores using the PhaseGenomics Microbial HiC kit and 150-bp paired-end reads on the Illumina platform. In addition, we extracted RNA for Illumina-based RNA sequencing (RNA-seq) experiments for genome annotation at four distinct spore and infection stages, as follows: ungerminated spores, germinated spores, and 6 and 9 days postinfection of wheat variety Morocco. RNA was extracted with the Qiagen RNAeasy kit according to the manufacturer’s instructions, followed by Illumina sequencing performed by GENEWIZ. All sequencing data are available under the BioProjects PRJNA749614 and PRJNA757545 of NCBI.

    We obtained, in total, 5.7 Gb of Nanopore long reads (L50 of reads = 35.02 kb), which represents approximately 30× coverage per haplotype. In addition, we obtained 8 Gb of 150-bp paired-end Illumina short reads for polishing and 60 Gb of 150-bp paired-end Illumina reads for the HiC dataset. We assembled, curated, preliminary phased, and scaffolded the genome as described recently using a hybrid approach (Duan et al. 2021). Briefly, the initial assembly was generated with Nanopore long reads followed by long-read base-error correction applying the same dataset using a combination of Canu 2.0 (Koren et al. 2017), Racon 1.4.13 (Vaser et al. 2017), and medaka 1.0.3. We polished this initial long-read assembly with Illumina short reads using Pilon 1.22 (Walker et al. 2014), as previously described (Duan et al. 2021). We curated assembly errors based on HiC contact maps generated with HiC-Pro 2.11.1 (Servant et al. 2015). We removed contigs identified as bacterial contaminations or mitochondrial sequences based on BLAST homology searches, GC content, and unusual genome coverage of remapped long reads (Altschul et al. 1990; Li 2018; Quinlan and Hall 2010). We tentatively phased the curated contigs into nuclear genome haplotypes with NuclearPhaser using previous P. striiformis f. sp. tritici 104E gene models as seeds (Duan et al. 2021; Schwessinger et al. 2018). However, we anticipate that the assembly might contain residual phase switches which currently cannot be addressed with Guppy v4 Nanopore sequencing data (Duan et al. 2021). Each resulting nuclear genome complement was scaffolded independently with Salsa 2.0 (Ghurye et al. 2019). Resulting chromosomes were further curated manually (Duan et al. 2021). We defined centromeric regions based on bowtie signatures in the HiC contact maps. Our final assembly resulted in two tentative nuclear assigned chromosome sets of 18 chromosomes of approximately 80 Mb each and 166 unplaced contigs of approximately 9 Mb, with 1,736 genes located on these contigs (Table 1). The karyotype and centromere locations are shown in Figure 1. These assembly statistics are a huge improvement over previous haploid or partially phased assemblies that contained the (primary) assembly in more than 100 contigs representing a single chimeric haplotype (Cantu et al. 2011, 2013; Cuomo et al. 2017; Li et al. 2019; Schwessinger et al. 2018, 2020; Vasquez-Gross et al. 2020; Xia et al. 2018; Zheng et al. 2013). We expect that our genome assembly has residual phase switching errors between haplotypes A and B, as described previously for a P. triticina hybrid assembly using equivalent sequencing and assembly approaches (Duan et al. 2021). In future, P. striiformis f. sp. tritici assemblies will be further improved by using high per-base accuracy long reads with accuracies >99% such as PacBio HiFi or Nanopore Q20+.

    Table 1. Genome assembly and quality statistics

    Fig. 1.

    Fig. 1. Karyotype plot of Puccinia striiformis f. sp. tritici chromosomes The figure depicts the two sets of 18 chromosomes, with haplotype A colored in red and haplotype B colored in blue. Bowties indicate the approximate locations of centromeres as defined by HiC interaction patters. The left scale bar indicates the size of each chromosome in megabase pairs.

    Download as PowerPoint

    We performed preliminary repeat prediction and soft masking with RepeatModeler v2 and RepeatMasker v4.1.2 to avoid the negative impact of repeat regions on downstream gene prediction processes (Flynn et al. 2020). In total, we predicted that approximately 40% of the genome comprises repeats which, overall, is in line with repeat annotations in previous P. striiformis f. sp. tritici genomes (Cantu et al. 2011, 2013; Schwessinger et al. 2018, 2020; Zheng et al. 2013). We annotated our soft-masked chromosome-scale assembly with the fungal-specific gene prediction and annotation pipeline funannotate v1.8.1 (Palmer and Stajich 2020) using previous high-quality P. striiformis f. sp. tritici proteome datasets (Schwessinger et al. 2018, 2020) and our strain-specific RNA-seq data as hints. The overall total and the haplotype-specific numbers of predicted genes (Table 1) were consistent with previous long-read P. striiformis f. sp. tritici genome assemblies. We used benchmarking universal single-copy orthologs (BUSCO, v.3.0.2) (Simão et al. 2015) to assess the completeness of our gene predictions using the BUSCO basidiomycete v9 dataset as reference (Table 1). In total, we predicted 97.6% of all BUSCOs, which is the highest completeness rate of any P. striiformis f. sp. tritici assembly to date. Finally, we predicted candidate effector proteins on the 4,904 secreted proteins using EffectorP3 and obtained 2,876 predicted effectors. Of these, 1,817 and 1,059 were predicted to be located in the cytoplasm and apoplasm, respectively.

    We anticipate that this first chromosome-scale assembly for P. striiformis f. sp. tritici will be a useful resource for the community studying P. striiformis f. sp. tritici effector function, host adaptation, and the evolution of wheat rust fungi. Future assemblies will address the residual issue of phase switching and incorporate a long-read RNA-seq dataset to further improve gene prediction.

    Data Availability

    The draft chromosome scale assembly and gene annotation are available at the NCBI GenBank database under the BioProjects PRJNA749614 and PRJNA757545. All raw read datasets are associated with BioProjects PRJNA749614.

    Acknowledgments

    We thank J. Sperschneider, J. Stajich, and J. Palmer for their assistance in genome assembly, curation, and distribution.

    The author(s) declare no conflict of interest.

    Literature Cited

    Funding: Support was provided to S. Periyannan by the Australian Research Council (ARC) with funding through the Discovery Early Career Researcher Award (DE17010015) and to B. Schwessinger by the ARC Future Fellowship (FT180100024).

    The author(s) declare no conflict of interest.

    Copyright © 2022 The Author(s). This is an open access article distributed under the CC BY 4.0 International license.