Genome Sequence of Phoma sorghina var. saccharum That Causes Sugarcane Twisted Leaf Disease in China
- Yixue Bao1 2
- Wei Yao1 2
- Zhenzhen Duan2
- Charles A. Powell3
- Baoshan Chen1
- Muqing Zhang1 3 †
- 1State Key Lab for Conservation and Utilization of Sub-tropical Agri-Biological Resources, Guangxi University, 530005, China
- 2Guangxi Key Lab for Sugarcane Biology, Guangxi University, 530005, China
- 3Indian River Research and Education Center, IFAS, University of Florida, 2199 South Rock Rd., FL 34934, U.S.A.
Phoma sorghina var. saccharum is a fungal pathogen that causes sugarcane twisted leaf disease in China. Here, we report complete genome assemblies of the Phoma sorghina var. saccharum isolate BS2-1, generated using single-molecule real-time sequencing. We present a high-quality genome sequence of a Phoma isolate that was assembled into 22 contigs with an N50 length of 1.92 Mb, a total length of 33.12 Mb, and a GC content of 52.12%. A total of 7,870 genes were annotated, using a combination of gene prediction tools, including 281 noncoding RNAs, 515 genes encoding carbohydrate-active enzymes, 2,440 genes associated with pathogen-host interactions, and 583 genes encoding secreted proteins. The complete genome sequence will be useful for understanding host-pathogen interaction and for improving disease management strategies.
The genus Phoma is one of the most abundant fungal genera, and its members are ubiquitous in the environment and occupy numerous ecological niches. Many Phoma species and related coelomycetes are plant pathogens that can cause leaf and stem spots. The diverse host ranges associated with the genus Phoma include corn, grain, citrus, sorghum, and sugarcane (de Gruyter et al. 2009). Sugarcane twisted leaf disease caused by Phoma sorghina var. saccharum results in a 5 to 10% loss in yield, which will pose a potential threat to cane production in China. The field symptoms appear as yellowing on midribs and leaf margins, which spread further to the entire leaf, along with curling and twisting of crown leaves (Lin et al. 2014).
Four sequences of the Phoma sp. genome are available, three of which are draft genomes produced using short reads of Illumina, including Phoma herbarum (BCGR00000000.1), Phoma koolunga (RWYX00000000.1), and Phoma sp. RAV-16-625 (SAUA00000000.1). The other complete genome of Phoma sp., XZ068 (RBKR00000000.1), is sequenced using long reads of PacBio and was assembled by the Celera Assembler (v.8.3) to predict complete gene models and clusters for secondary metabolites (Zhai et al. 2019).
Here, we reported the genome sequence of Phoma sorghina var. saccharum BS2-1, which was isolated from symptomatic sugarcane twisted leaves collected in Guangxi, China. The BS2-1 strain was purified by single-spore isolation and was identified using morphological observations and phylogenetic analysis of 28S rDNA (large subunit, LR0R/LR7), 18S rDNA (small subunit, NS1/NS4), internal transcribed spacer regions 1 and 2, 5.8S nuclear ribosomal DNA (ITS1/ITS4), β-tubulin (TUB2Fd/TUB4Rd), and the alpha translation elongation factor (Efdf/EF1-2218R) (accession numbers KF171356 to KF171360, KJ669181 to KJ669188). High-quality genomic DNA was extracted from 7-day-old mycelia grown in vitro on potato dextrose water medium, using a modified Fungal DNA Midi Kit (Omega Bio-Tek, Inc.). The genome of the BS2-1 isolate was sequenced using both the Illumina HiSeq4000 and the PacBio RSII Sequel platforms according to manufacturer protocol. The paired-end 300-bp library produced 22,730,042 short reads that resulted in 3.4 Gb of data and 103× coverage of the genome. To improve the assembly, a long-read library of approximately 20 Kb of fragments was constructed, using the single-molecule real-time Bell template prep kit. Sequencing was performed in three PacBio RSII cells with P6-C4 chemistry. Self-corrections of subreads were performed using an error correction model in the Falcon package v1.8.7 (Beckett et al. 2014). The EC-tool was used to correct the raw data, and high-accuracy subreads that included 3,200,762,563 bp (approximately 97× coverage) were generated and were de novo assembled using Canu v1.5 (Koren et al. 2017). The entire genome consisted of 22 contigs totaling 33.12 Mb of sequence, with 52.12% GC content. The N50 of the genome was 1.92 Mb. In comparison, the genome of Phoma sp. XZ068 (RBKR00000000.1) is 40.20 Mb with 50.4% GC content, consisting of 33 contigs with an N50 value of 1.4 Mb and L50 of 11 (Table 1). Twenty-two contigs ranged in length from 28.7 Kb to 4.02 Mb. The presence of telomeric repeats (TTAGGG) at the ends of a contig was indicative of a complete chromosome (Seidl et al. 2015). Eleven contigs had telomeric repeat sequences on both ends and nine contigs having telomeric repeats at the 3′ end. Therefore, the genome contained at least 11 complete chromosomes. The genome also had one integrated circular mitochondrial DNA that was approximately 66.7 Kb. Ten contigs without telomeric repeats on both ends might represent dispensable chromosomes and require further characterization (Supplementary Table S1). The quality of the assembled genome was assessed using benchmarking universal single-copy orthologs (BUSCO v3.0.2) with the lineage dataset fungi_odb9. A total of 286 of 290 complete and single-copy BUSCOs were identified in our BS2-1 genome; of the remaining four, one was fragmented and three were absent (Simão et al. 2015).
Protein coding genes were identified in the assembled genome, using both de novo and homology methods available with Augustus v2.4 (Stanke and Waack 2003), GlimmerHMM v3.0.4 (Majoros et al. 2004), SNAP (Korf 2004), and GeMoMa (Birney et al. 2004). Consensus gene structures were integrated using Evidence Modeler v1.1.1 (Haas et al. 2008). All gene models were annotated according to the best alignment match obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-Prot, and nonredundant protein (NR) databases using BLASTP in BLAST+ package v2.2.6 (Boeckmann et al. 2003; Deng et al. 2006; Kanehisa et al. 2004). A total of 7,870 protein-encoded genes were annotated in the assembled genome. Among the annotated genes that had homologs in other species, 2,767 (35.16%) genes were mapped to the KEGG database, 5,239 (66.57%) had similarity to proteins in the Swiss-Prot database, and 7,851 (99.76%) were mapped to the NR database. Homeoboxes and homeodomains, which are associated with leaf curling (Johannesson et al. 2001), were identified in the annotated genes (Supplementary Table S2). A total of 515 annotated genes were related to carbohydrate-active enzymes, as indicated by a CAZyme database search (Cantarel et al. 2009), 115 were associated with transport, as indicated by a Transporter Classification Database search, 2,440 genes were predicted to be involved in interactions between the pathogen and host, as evidenced by a Pathogen-Host Interactions search, and 583 genes were predicted to produce a secreted protein (Saier et al. 2006; Winnenburg et al. 2006). BLASTn and tRNAscan-SE were used to identify 281 noncoding RNAs, including 168 transfer RNAs and 113 ribosomal RNAs (Altschul et al. 1990; Lowe and Eddy 1997). Several gene clusters were identified in search of the antiSMASH fungal v4.0.0 server, including two nonribosomal peptide synthetases (NRPS), three terpenes, eight NRPS-like, and 11 polyketide synthases. Notably, BS2-1 had a specific and complete gene cluster for phomopsins, a gene family involved in toxic metabolite production, that are being pursued as potential leads for the development of antitumor drugs.
In summary, we reported a highly contiguous genome assembly for Phoma sorghina var. saccharum, the causative agent of sugarcane twisted leave disease. Eleven chromosomes could be assembled from telomere to telomere. This whole-genome sequencing data will facilitate exploration of the host-pathogen interaction of this fungus and provide a resource for genes that could be targeted for therapeutic development. The genome sequence described in this article has been deposited in GenBank under the accession number VXJJ00000000 (BioProject: PRJNA565496; BioSample: SAMN12748852).
Author-Recommended Internet Resource
antiSMASH fungal v4.0.0 server: https://fungismash.secondarymetabolites.org
The author(s) declare no conflict of interest.
- 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. https://doi.org/10.1016/S0022-2836(05)80360-2 Crossref, Medline, ISI, Google Scholar
- 2014. FALCON: A software package for analysis of nestedness in bipartite networks. F1000 Res. 3:185. https://doi.org/10.12688/f1000research.4831.1 Crossref, Medline, Google Scholar
- 2004. GeneWise and Genomewise. Genome Res. 14:988-995. https://doi.org/10.1101/gr.1865504 Crossref, Medline, ISI, Google Scholar
- 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365-370. https://doi.org/10.1093/nar/gkg095 Crossref, Medline, ISI, Google Scholar
- 2009. The carbohydrate-active enzymes database (CAZy): An expert resource for glycogenomics. Nucleic Acids Res. 37 (DatabaseD233-D238. https://doi.org/10.1093/nar/gkn663 Crossref, Medline, ISI, Google Scholar
- 2009. Molecular phylogeny of Phoma and allied anamorph genera: Towards a reclassification of the Phoma complex. Mycol. Res. 113:508-519. https://doi.org/10.1016/j.mycres.2009.01.002 Crossref, Medline, Google Scholar
- 2006. Integrated nr database in protein annotation system and its localization. Comput Eng. 32:71-72. Google Scholar
- 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9:R7. https://doi.org/10.1186/gb-2008-9-1-r7 Crossref, Medline, ISI, Google Scholar
- 2001. DNA-binding and dimerization preferences of Arabidopsis homeodomain-leucine zipper transcription factors in vitro. Plant Mol. Biol. 45:63-73. https://doi.org/10.1023/A:1006423324025 Crossref, Medline, ISI, Google Scholar
- 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32:D277-D280. https://doi.org/10.1093/nar/gkh063 Crossref, Medline, ISI, Google Scholar
- 2017. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27:722-736. https://doi.org/10.1101/gr.215087.116 Crossref, Medline, ISI, Google Scholar
- 2004. Gene finding in novel genomes. BMC Bioinformatics 5:59. https://doi.org/10.1186/1471-2105-5-59 Crossref, Medline, ISI, Google Scholar
- 2014. First report of Phoma sp. causing twisting and curling of crown leaves of sugarcane in the mainland of China. Plant Dis. 98:850. https://doi.org/10.1094/PDIS-10-13-1061-PDN Link, ISI, Google Scholar
- 1997. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955-964. https://doi.org/10.1093/nar/25.5.955 Crossref, Medline, ISI, Google Scholar
- 2004. TigrScan and GlimmerHMM: Two open source ab initio eukaryotic gene-finders. Bioinformatics 20:2878-2879. https://doi.org/10.1093/bioinformatics/bth315 Crossref, Medline, ISI, Google Scholar
- 2006. TCDB: The transporter classification database for membrane transport protein analyses and information. Nucleic Acids Res. 34:D181-D186. https://doi.org/10.1093/nar/gkj001 Crossref, Medline, ISI, Google Scholar
- 2015. The genome of the saprophytic fungus Verticillium tricorpus reveals a complex effector repertoire resembling that of its pathogenic relatives. Mol. Plant-Microbe Interact 28:362-373. https://doi.org/10.1094/MPMI-06-14-0173-R Link, ISI, Google Scholar
- 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210-3212. https://doi.org/10.1093/bioinformatics/btv351 Crossref, Medline, ISI, Google Scholar
- 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19 (Suppl 2ii215-ii225. https://doi.org/10.1093/bioinformatics/btg1080 Crossref, Medline, ISI, Google Scholar
- 2006. PHI-base: A new database for pathogen host interactions. Nucleic Acids Res. 34:D459-D464. https://doi.org/10.1093/nar/gkj047 Crossref, Medline, ISI, Google Scholar
- 2019. Identification of the gene cluster for bistropolone-humulene meroterpenoid biosynthesis in Phoma sp. Fungal Genet. Biol. 129:7-15. https://doi.org/10.1016/j.fgb.2019.04.004 Crossref, Medline, ISI, Google Scholar
Yixue Bao, Wei Yao, and Zhenzhen Duan contributed equally to this manuscript.
The author(s) declare no conflict of interest.
Funding: This work was supported by China Agricultural Research System grant 170109, Scientific Research and Technology Development Program of Guangxi grants Guike AA17202042-7, and Guike AD17129002, and the Innovation Project of Guangxi Graduate Education grant YCBZ 2018014.