ReviewFree Access icon

Genomics and Informatics, Conjoined Tools Vital for Understanding and Protecting Plant Health

    Authors and Affiliations
    • Seogchan Kang1
    • Ki-Tae Kim2
    • Jaeyoung Choi3
    • Hyun Kim4
    • Kyeongchae Cheong5
    • Ananda Bandara1
    • Yong-Hwan Lee4 5
    1. 1Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, PA 16802, U.S.A.
    2. 2Department of Agricultural Life Science, Sunchon National University, Suncheon 57922, Korea
    3. 3Korea Institute of Science and Technology Gangneung Institute of Natural Products, Gangneung 25451, Korea
    4. 4Department of Agricultural Biotechnology, Seoul National University, Seoul 08826, Korea
    5. 5Plant Immunity Research Center, Seoul National University, Seoul 08826, Korea


    Genomics’ impact on crop production continuously expands. The number of sequenced plant and microbial species and strains representing diverse populations of individual species rapidly increases thanks to the advent of next-generation sequencing technologies. Their genomic blueprints revealed candidate genes involved in various functions and processes crucial for crop health and helped in understanding how the sequenced organisms have evolved at the genome level. Functional genomics quickly translates these blueprints into a detailed mechanistic understanding of how such functions and processes work and are regulated; this understanding guides and empowers efforts to protect crops from diverse biotic and abiotic threats. Metagenome analyses help identify candidate microbes crucial for crop health and uncover how microbial communities associated with crop production respond to environmental conditions and cultural practices, presenting opportunities to enhance crop health by judiciously configuring microbial communities. Efficient conversion of disparate types of massive genomics data into actionable knowledge requires a robust informatics infrastructure supporting data preservation, analysis, and sharing. This review starts with an overview of how genomics came about and has quickly transformed life science. We illuminate how genomics and informatics can be applied to investigate various crop health-related problems using selected studies. We end the review by noting why community empowerment via crowdsourcing is crucial to harnessing genomics to protect global food and nutrition security without continuously expanding the environmental footprint of crop production.

    Several complementary approaches support efforts to meet the steadily increasing global need for quality plant products without heavily relying on synthetic pesticides. One approach is harnessing the vast pool of plant genes via breeding to produce disease-resistant crop varieties (Hickey et al. 2019). Rapid advances in understanding the molecular mechanism of disease resistance and susceptibility present many candidate genes and processes that can be tweaked or transferred using gene manipulation tools to enhance plant immunity (Bailey-Serres et al. 2019; Frailie and Innes 2021; Garcia-Ruiz et al. 2021). An improved understanding of how pathogens cause diseases, break down disease resistance, and move around locally and globally also helps to develop and deploy new protection strategies (Garcia-Ruiz et al. 2021; Stam et al. 2021). Microbial allies have been deployed to suppress pathogens directly or fortify plant immunity (French et al. 2021; Kang et al. 2021; Rodriguez et al. 2019; Vlot et al. 2021). Cultural practices (e.g., nutrient management, soil amendment, crop rotation) guided by an improved understanding of how such practices affect plants and soil ecology can help suppress various diseases (French et al. 2021; Park and Ryu 2021; Tarr 1972). These approaches are complementary and synergistic, with new insights and products from one often helping improve others. Continuous improvement of the effectiveness of these approaches is crucial to reduce environmental pollution and degradation of ecosystem services by pesticides and incidences of pesticide resistance.

    Genomics provides extensive data and versatile tools that support these approaches. Another vital resource is informatics because genomics-driven inquiries require informatics tools for data analysis, integration, preservation, and sharing. The vast volume and complexity of genomics data made informatics an indispensable companion because raw genomics data offer limited value until adequately processed and analyzed. Informatics tools quickly uncover noteworthy features and patterns within and between individual genomes and functional genomics data sets, leading to new hypotheses and questions. Informatics also assists in investigating how individual organisms have evolved by allowing the comparison of genome sequences and organizations at various taxon levels (Gupta 2016; Tello-Ruiz et al. 2021) and supports wide sharing of accumulated data and knowledge via crowdsourcing (Editorial 2018; Kang 2014). Because sustainable crop production is a global challenge, we must engage and empower global stakeholders to meet the challenge. Without a robust informatics infrastructure supporting these stakeholders, innovative applications of genomics to ensure global food and nutrition security without overtaxing the environment would be curtailed. Informatics tools will continuously evolve to offer more user-friendly services, allowing more researchers to deploy genomics to solve complex problems.

    Rapidly increasing genomics data and tools have transformed how we generate new knowledge and solve problems. This review starts with a brief history of genomics because history is a great teacher for preparing us to meet future challenges and opportunities. Although the history of genomics is short, it illustrates how “disruptive” new technologies transform science and offers many lessons that can guide us in meeting future challenges. Because there exist many insightful treatises of this history, some of which are cited below, only several pivotal events and transformative innovations are highlighted. Genomics’ history is followed by selected examples exhibiting how genomics and informatics have advanced plant health–related research and problem-solving. The review ends with a brief note on why crowdsourcing is critical for harnessing genomics to protect global food and nutrition security and environmental sustainability.


    Like many other disruptive innovations (Thiel 2014), genomics has gone through some growing pains before becoming a proverbial game-changer.

    Its beginning was highly contentious.

    In the mid-1980s, a small group of researchers proposed the Human Genome Project (HGP) and promoted it as a grand endeavor equivalent to the Apollo Moon Landing and the Manhattan Project (Hood and Rowen 2013; Institute of Medicine 1991). However, this proposal faced widespread skepticism and deep concerns, fomenting fierce opposition (Rechsteiner 1991; Walsh and Marks 1986). As noted by James Watson, a leading figure for the HGP and then director of the Cold Spring Harbor Laboratory, most biologists were against the HGP: “I am for the project, although everyone I talk to at Cold Spring Harbor is against it” (Institute of Medicine 1991). Many communities sent petitions to the U.S. Congress to block HGP funding because they believed that the HGP was flawed science and would offer little value but severely harm most established research communities (Rechsteiner 1991). The following colorful words used against those who pushed for the HGP exemplify how emotionally charged the opposition was: “Any biologists who proposed such projects would doubtless be obliged to carry them out in a padded laboratory—such a major endeavor would occupy every living molecular biologist, regardless of competency, for several years, thus solving the growing unemployment crisis. The social consequences of keeping molecular biologists off the streets, where they might develop into thugs, ruffians, land-fraud swindlers, or worse, are obvious” (Walsh and Marks 1986).

    The fierce opposition against the HGP may look enigmatic to those who started a research career during the past 20 years but was based on several concerns, one of which was whether DNA sequencing technology and other required methods were robust and efficient enough to accomplish this feat effectively. Although DNA cloning and sequencing techniques developed in the 1970s made it possible to sequence DNA, sequencing had been neither easy nor fast until automated DNA sequencers and improved sequencing methods came along. The Sanger sequencing method (Sanger et al. 1977) required four separate reactions to sequence one template. For example, a reaction for determining the position of C in the DNA template consisted of a radioisotope (S35 or P33)–labeled deoxynucleotide, 2′,3′-dideoxycytidine-5′-triphosphate (ddCTP), and a mix of four deoxynucleotides, including a small amount of dCTP. When a ddCTP gets incorporated into a growing chain of DNA, DNA synthesis terminates. After gel electrophoresis of individual reactions and exposure to X-ray film (Fig. 1), the resulting banding patterns were read using human eyes. Applied Biosystems released AB370A, the first sequencing instrument based on a modified Sanger method employing fluorescent dye-labeled ddNTPs, in 1986. Thanks to this modification, only one reaction was necessary to sequence one template. A mobile fluorescence scanner linked to a computer could read sequences as sequencing products labeled with one of the four fluorescent dyes pass through the gel. AB370A could analyze as many as 96 templates per run and read ∼600 bp per template (under best conditions), maximally reading ∼500 kb per day (with two runs per day). However, AB370A was not fully automated and required slap gel preparation and manual sample loading. In 1995, sequencing became fully automated with the release of ABI310, a sequencer that uses capillaries for DNA separation to automate gel preparation and sample loading.

    FIGURE 1

    FIGURE 1 Sanger sequencing results. Part of an X-ray film shows sequencing results from three DNA templates (denoted by pencil-drawn lines). The sequencing of one template required four reactions. The four lanes for each sample correspond to G, A, T, and C (left to right), respectively.

    Download as PowerPoint

    The HGP initially used the strategy of cloning pieces of human chromosomes in several libraries of different insert sizes, ranging from a few kilobases (plasmid libraries) to megabases (yeast artificial chromosome libraries), followed by physical mapping/tiling of these clones before sequencing. This strategy was labor- and time-intensive. Given the status of the required technologies for genome sequencing in the 1980s, many feared that the HGP would require factory-scale laboratories, with many researchers repeating laborious tasks day after day for many years (Fig. 2). A related concern was that the HGP would subject many students and junior scientists to this mind-numbing process, which was criticized as a terrible way to prepare new researchers (Rechsteiner 1991). Some questioned the value of sequencing the entire human genome. Some asserted that billions of dollars would be wasted to read so-called “junk DNA” because <10% of the human genome encodes proteins and proposed a large-scale expressed sequence tag analysis as an alternative and cost-effective strategy for human gene discovery.

    FIGURE 2

    FIGURE 2 Concerns that compelled the strong opposition against the Human Genome Project (HGP). This cartoon was included in a position paper arguing against the HGP (Rechsteiner 1991). Permission to reproduce this drawing was obtained from the Copyright Clearance Center.

    Download as PowerPoint

    In addition to these concerns, the HGP was perceived as an existential threat to many researchers, especially new investigators. Because of its enormous budget (estimated to be $3 billion U.S. dollars), the HGP raised the fear that it would siphon off most of the federal research budget supporting individual investigator-driven projects (Rechsteiner 1991), prompting pushback and project adjustments (Hood and Rowen 2013). Sequencing the genomes of a few model organisms became an objective based on the expectation that new tools and experiences from such projects would help more efficiently analyze the human genome. The HGP also set aside 5% of its budget to support studies on social, ethical, and legal issues anticipated from deciphering the human genome so that adequate policies and guidelines could be developed (Knoppers et al. 2013). The Secret of Life: Redesigning the Living World (Levine and Suzuki 1993), a companion book to a Public Broadcasting Service TV series, elegantly presented various social and ethical challenges and dilemmas anticipated from the rapid advances in understanding human biology.

    The HGP was officially launched in 1990 with several international partners under the Human Genome Organization umbrella and publicly declared complete in 2003. Sequences of the human genome’s euchromatic regions (∼92% of the genome), their gene content and organization, and notable features were published. The published assemblies had only several hundred gaps, indicating very high data quality. However, they did not cover the centromeric and telomeric regions as a result of technical difficulties in cloning, sequencing, and assembly. An international effort to generate telomere-to-telomere sequences for all human chromosomes, led by the Telomere-to-Telomere Consortium, is nearly completed, with only the Y chromosome remained to be fully deciphered (Reardon 2021). The HGP achieved its primary goal approximately 2 years earlier and with less money than initially anticipated in part because many researchers contributed new methods and tools. The competition also played a significant role (Shreeve 2004). Celera Genomics, a company jointly formed by Perkin-Elmer (owner of Applied Biosystems) and The Institute for Genomic Research (TIGR), proposed a whole-genome shotgun sequencing approach to assemble the human genome rather than using the HGP’s strategy. Celera Genomics’ approach was based on its successful application by TIGR to assemble the genome of Haemophilus influenzae (1.8 mb), a human pathogenic bacterium (Fleischmann et al. 1995). This milestone, enabled by improved informatics tools for genome assembly without requiring detailed physical mapping of individual clones used for sequencing, was quickly followed by publications of bacterial and archaeal genomes as well as the genomes of a few model organisms, including Saccharomyces cerevisiae in 1996, Caenorhabditis elegans in 1996, Arabidopsis thaliana in 2000, Drosophila melanogaster in 2000, and mouse in 2002.

    Although genome sequencing of crop plants and plant-associated microbes lagged, talks about how to support crop production via genome sequencing started years before the official launch of the HGP (Briggs 1998; Phillips and Freeling 1998). The genome of rice, the first crop plant sequenced, was published in 2005 (International Rice Genome Sequencing Project 2005). Genome sequences of more crop species followed despite multiple challenges, such as huge genome sizes, polyploidy, and limited resources (Bolger et al. 2014). Xylella fastidiosa, a fastidious, xylem-limited bacterium that causes citrus variegated chlorosis and other diseases, was the first plant pathogen sequenced (Simpson et al. 2000). The genomes of Magnaporthe oryzae, a hemibiotrophic ascomycete that causes rice blast, and Ustilago maydis, a biotrophic basidiomycete that infects maize, were published in 2005 (Dean et al. 2005) and 2006 (Kämper et al. 2006), respectively. The advent of next-generation sequencing (NGS) technologies accelerated the pace of sequencing plants and microbes (Bolger et al. 2014; Stajich 2017; Xu and Wang 2019). Many plant pathology laboratories routinely sequence pathogen isolates to support their research (Stam et al. 2021). Informatics tools for genome assembly, annotation, and analysis made genome sequencing a routine approach for diverse inquiries. As summarized below, the HGP and other sequencing projects changed how we approach various questions and problems.


    “We tend to overestimate the short-term impact of a technology and underestimate its long-term impacts.”–Roy Amara

    Genomics is no exception to this assertion. It did not take long to realize that deciphered genome sequences would not quickly solve medical and agricultural problems and yield new commercial products. However, many transformative changes and new opportunities catalyzed by genomics underscore that its impact is much more profound and broader than most initially envisioned. One notable change is the rise of “big science.” Unlike physics, most biology projects had been driven by individual investigators before the HGP. After the HGP, it has become common to encounter research articles listing hundreds of authors in many countries, mainly because annotating assembled genome sequences and extracting new knowledge from the massive amount of data required many people with different expertise. Big science is not necessarily better than small science, and they should complement each other. However, coordinated multidisciplinary efforts are often needed to tackle big problems. As the complexity of problems and required approaches increases, the need for big science will unlikely diminish. The HGP helped guide subsequent big projects by offering lessons from its trials and errors and codified policies concerning data collection, use, and sharing (Marshall 2001). Generation of enormous resources without accompanied policies guiding their use would unlikely serve user communities effectively.

    As noted earlier, sequencing became a versatile and powerful tool for generating new questions and hypotheses thanks to NGS technologies (Suzuki 2020). Exponential increases in the length of sequence per read (e.g., >2 mb via nanopore sequencers) and the number of reads per run (e.g., ∼20 billion reads in <2 days by the Illumina NovaSeq 6000) drastically increased the throughput of sequencing while reducing the cost (Amarasinghe et al. 2020). Sequencing and assembling the first human genome required almost $3 billion and a decade. The same task can now be performed within a few days and costs <$1,000. The approach of “sequencing first and asking questions later” has become common because genome sequencing does not cost much but quickly generates extensive data that will guide and facilitate diverse inquiries. The sequencers based on massive parallel sequencing (also called second-generation sequencing), such as Illumina’s NovaSeq and HiSeq and Thermo Fisher’s Ion Torrent, generate vast numbers of reads per run. However, the short length of individual reads (a few hundred base pairs) complicates some downstream analyses, such as genome assembly, accurately counting the number of individual molecules (e.g., metagenome analysis), and identifying transcript isoforms (i.e., alternatively spliced transcripts) and large DNA structural variants (Amarasinghe et al. 2020).

    Third-generation sequencing methods, such as PacBio single-molecule real-time (SMRT) and Oxford nanopore technologies, generate much longer reads, avoid amplification bias, and help detect base modifications (e.g., methylation) and structural variants. Although the accuracy of the base calling by SMRT and nanopore technologies is lower than with short-read sequencers, it is expected to improve (Amarasinghe et al. 2020; Suzuki 2020). Besides, several methods (e.g., using both long- and short-read data) are available for error correction (Amarasinghe et al. 2020). MinION, a portable nanopore sequencer that can read 30 gb of DNA or 7 to 12 million nucleotides of RNA per flow cell in real time, has been used for on-site pathogen diagnosis and discovery (Villamor et al. 2019) and rapid biodiversity assessments (Pomerantz et al. 2018).

    Rapidly evolving sequencing technologies, in combination with other tools, enabled investigation of the mechanism of many processes in single cells (Linnarsson and Teichmann 2016) and populations (Stam et al. 2021). For example, sequencing coupled with methods for isolating individual cells or a small group of cells (e.g., cell sorting using flow cytometry, laser-assisted microdissection, microfluidics) has been applied to compare genomes and gene expression patterns between different cells and tissues (Linnarsson and Teichmann 2016; Tang et al. 2019). Sequencing guided by high-resolution imaging and sampling of subnuclear regions helped characterize the chromatin structure in situ (Payne et al. 2021). Analyses of microbial communities associated with various environments (e.g., soils, water, air, different tissues of plants) by massively sequencing phylogenetically informative loci (i.e., metabarcoding) revealed how such communities are structured and change in response to various treatments and conditions (Bonito et al. 2014; French et al. 2021; Trivedi et al. 2020). Efforts to characterize the microbial communities associated with humans (Cullen et al. 2020), animals (Bahrndorff et al. 2016), and plants (Agoussar and Yergeau 2021; Song et al. 2021) are rapidly increasing, helping understand how they impact host health and guiding targeted microbiome manipulations to benefit hosts.

    Accumulating genome sequences stimulated efforts to develop new tools for analyzing how diverse molecular processes and interactions operate at levels ranging from single cells (Linnarsson and Teichmann 2016; Tang et al. 2019) to organismal communities (Zancarini et al. 2021). Toolboxes for analyzing the transcriptome (Todd et al. 2016), proteome (Ahmad and Lamond 2014), and metabolome (Kellogg and Kang 2020) of individual organisms or communities are rapidly expanding. Advances in informatics tools enabled diverse in silico analyses and comparisons of genomics data, allowing us to infer the biology and evolution of a newly sequenced organism using previously characterized organisms as references. Phenomics has facilitated high-throughput phenotyping of many organisms under tightly controlled conditions (Houle et al. 2010; Tardieu et al. 2017), helping identify which genes and genotypes likely affect specific traits and how genotypes and environmental conditions interact in expressing specific phenotypes. Collectively, these tools have helped investigate many questions too complex to approach not long ago.

    A growing collection of deciphered genomes also allows us to place many inquiries in clearly defined genetic boundaries. In the pregenomics era, all inquiries, even for model organisms like S. cerevisiae and Escherichia coli, were carried out without knowing how many genes they carry, which products they encode, and genomic similarities/differences between different strains. Metaphorically speaking, we now have most pieces of many jigsaw puzzles in front of us. Assembling these puzzles still is not a trivial task. However, the task has undoubtedly gotten much easier and faster than solving them without knowing the number of puzzle pieces and their shape and color. Many tools that help translate genome sequences into biological insights contribute to solving these puzzles: functional genomics helps systematically identify which pieces are likely connected. Comparative genomics helps solve new puzzles by allowing us to use previously solved puzzles as guides and models.

    Genomics significantly diminished the need to rely on a few model organisms as surrogates to understand the biology of many lesser-endowed relatives. The “security council of genetic organisms” (Fink 1998) will continuously lead efforts to uncover how fundamental life processes operate and help understand the biology and evolution of less-characterized organisms by serving as references and offering new tools. However, genomics data and tools enabled us to investigate most organisms directly without heavily relying on surrogate models.


    Genomics and informatics are inseparable because converting extensive genomics data into new knowledge requires a wide array of computer-assisted analyses. Amassing genomics data without adequate informatics support would quickly cause a data-rich but knowledge-poor state. The need for user-friendly informatics tools will continuously grow as more researchers want to apply genomics. Rapidly increasing computing power and evolving machine-learning (ML) algorithms will help meet this need by lowering barriers to large-scale data analyses (Shastry and Sanjay 2020). Below, we illustrate how genomics and informatics can be applied to research and problem-solving related to crop health using selected examples.

    Genome sequencing and in silico analyses enlighten the biology and evolution of plants and microbes.

    Rapidly accumulating genome sequences have accelerated research and problem-solving in many ways. Genome sequences of cultivated plant species and their wild relatives enlightened the genetic/genomic basis of plant evolution and crop domestication (Schreiber et al. 2018; Ye and Fan 2021). Their genomes helped identify candidate genes and processes that determine critical traits, expediting crop improvement via breeding, gene transfer, or targeted modification (Frailie and Innes 2021). De novo domestication of wild allotetraploid rice was achieved by simultaneously modifying several genes critical for agronomic traits using a CRISPR/Cas9 method (Yu et al. 2021). The targeted genes were identified via a comparative analysis of wild and domesticated rice genomes. Rapid advances in understanding the genetic basis of crucial crop traits, driven by genome sequencing, functional genomics, and comparative genomics, will help convert some wild plants to new crops without going through lengthy traditional domestication processes (Fernie and Yan 2019; Gasparini et al. 2021; Van Tassel et al. 2020). The focus of genome sequencing has been on model plants and globally traded crops, but efforts to decipher the genomes of orphan crops and their wild relatives are increasing (Ye and Fan 2021). The latter efforts are crucial to ensure global food and nutrition security (Fears et al. 2019).

    Genome sequences of diverse plant-associated microbes similarly helped identify candidate genes involved in pathogenesis and symbiosis, support pathogen discovery and identification, and assist in developing new strategies for crop protection (Kellogg and Kang 2020; Stam et al. 2021). Because microbial genome sequencing, especially resequencing, can be carried out quickly and inexpensively, genome sequencing has become a common approach. Although cataloging the genomes of all microbial species associated with the Earth’s biosphere will require time, it is no longer a pipe dream, as evidenced by the Earth BioGenome Project (Lewin et al. 2018). Sequence-based metabarcoding revealed the existence of diverse microbes that have not yet been cultured. Massive sequencing of DNA samples extracted from diverse environments reveals the resident microbiome’s composition and their partial genome sequences. The resulting metagenome sequences help deduce their biology and ecology using their sequence similarity to previously characterized microbes (see below for specific examples of metagenomic applications).

    Parallel to sequencing the genomes of known species, deciphering the genomes of diverse strains within individual plant (Bayer et al. 2020; Della Coletta et al. 2021) and microbial (Domingo-Sananes and McInerney 2021; McCarthy and Fitzpatrick 2019; van Dam et al. 2018; Zhong et al. 2021) species has been increasing. Pan-genome data, encompassing the genes shared by all or most strains (core genes) and those that are strain-/population-specific (accessory genes), offer insights that genome sequences of a few individuals within species cannot provide. Genomes of diverse strains within many bacterial species revealed that the number of core genes is smaller than that of accessory genes (Domingo-Sananes and McInerney 2021; Vernikos et al. 2015). This finding indicates a very high degree of genetic diversity and variation within individual species, underscoring the need to analyze pan-genomes to assess their phenotypic diversity and potential to change adequately. The proportion of accessory genes in fungal pan-genomes (10 to 20%) is less than that observed in bacteria (McCarthy and Fitzpatrick 2019). Gene duplication appears to have mainly driven the generation of accessory genes in some fungi (McCarthy and Fitzpatrick 2019), but horizontal gene transfer seems to have played a more prominent role in different fungi (van Dam et al. 2017, 2018). Because many accessory genes of bacterial and fungal pathogens participate in processes critical for disease development, such as antibiotic resistance, producing specialized metabolites, virulence, and host specificity, analyzing their pan-genomes is vital to understand and control pathogens (Domingo-Sananes and McInerney 2021; McCarthy and Fitzpatrick 2019; Stam et al. 2021; van Dam et al. 2018; Vernikos et al. 2015). For example, sequencing diverse M. oryzae strains isolated from multiple hosts revealed candidate effector genes in most strains and those unique to certain pathotypes, which is crucial for understanding what factors determine virulence and how virulence varies (Kim et al. 2019). An improved understanding of this mechanism will support disease control by helping recognize new pathogen variants, revealing how M. oryzae generates new races, and guiding the deployment of appropriate disease-resistance genes via breeding. A recent Phytopathology Focus Issue (Stam et al. 2021) highlights how genome sequences at the population level have contributed to investigating pathogen biology and epidemiology.

    Analyses of plants and microbes using various functional genomics tools have rapidly expanded our understanding of how they carry out various critical functions related to crop health and respond to different conditions at levels ranging from individual organisms (Kellogg and Kang 2020; Xu and Wang 2019) to populations (Stam et al. 2021; Zancarini et al. 2021). External stimuli cause a cascade of molecular changes to ensure short-term cellular homeostasis and guide a long-term developmental change. Because the transcriptome, proteome, and metabolome associated with such changes are connected and influence each other via regulatory feedback loops, each data set should be analyzed in association with other data sets to understand the mechanism underlying such molecular changes. Informatics tools are indispensable for analyzing and integrating disparate functional genomics data to support systems biology.

    Besides standalone informatics tools designed to help analyze specific gene families or functional groups, various platforms support in silico biology of plants, microbes, and their interactions by providing genomics data and data analysis and visualization tools (Table 1). Some platforms, like the Arabidopsis Information Resource (Berardini et al. 2015) and Saccharomyces Genome Database (Cherry et al. 2012), archive diverse data and resources derived from one species, whereas platforms like EuPathDB: The Eukaryotic Pathogen Genomics Database Resource (Warrenfeltz et al. 2017) and Comparative Fungal Genomics Platform (Choi et al. 2013b) support genomics of diverse species. These platforms offer precomputed data associated with specific gene families or functional groups (Choi et al. 2010, 2013a, 2014a, b, 2015; Park et al. 2008a, b) to assist users in performing studies on gene function and evolution. The following section highlights how some such precomputed data have been used to identify and characterize fungal effector genes.

    TABLE 1 Bioinformatics pipelines/tools and databases supporting genomics-enabled research on plants and eukaryotic microbes

    These platforms will continuously evolve to archive more than genomics data to serve user communities better. One critical resource that should be tightly linked to genomics data are culture/germplasm collections. A recent proposal calls for more efficient utilization of ex situ collections of plant genetic resources to support crop improvement by judiciously using genomics data (Mascher et al. 2019). Collections of voucher microbial specimens representing diverse species and populations have played pivotal roles in advancing basic and translational research by connecting discoveries of the present to previously published knowledge, offering cultures for prospecting a wide range of commercial products based on new data and tools, and guiding the discovery and description of novel microbes (Boundy-Mills et al. 2020; Díaz-Rodríguez et al. 2021). However, many collections, especially those housed in individual laboratories, are threatened by dwindling support (Kang et al. 2006). Genomics data derived from well-curated cultures add new values, which should increase their use. Because most users of community platforms will likely want to perform disparate analyses using the archived data, their own data, or a combination of both, ideally, the platforms should also provide commonly used tools for data analysis, visualization, and downloading. Adequately meeting such needs will require standardized data formats, centralized databases, and interoperable task management systems. As outlined in a recent position paper from the A. thaliana community (International Arabidopsis Informatics 2019), long-term planning in building and improving community platforms is vital to ensure that genomics data effectively support research and problem-solving.

    Genome-wide mining of candidate effectors helped to understand the mechanism of microbial manipulation of plants and plant defense.

    Both symbiotic and pathogenic microbes secrete diverse proteins and metabolites to manipulate or communicate with host plants, modify surrounding environments, and antagonize or cooperate with neighboring microbes (Kang et al. 2021; Rodriguez et al. 2019; Vincent et al. 2020). Some of the secreted molecules called effectors contribute to disease development by blocking pathogen recognition, inhibiting or misguiding specific plant defense responses, and facilitating pathogen proliferation in planta (Dalio et al. 2018; Uhse and Djamei 2018). Identification of the entire effector repertoire of individual species and their functional characterization have greatly improved our understanding of how pathogens cause diseases and which cellular components and processes individual effectors target to circumvent the plant immunity (Dalio et al. 2018). The latter information also advanced our understanding of how plant defense machinery operates. Advances in effector biology support efforts to assess the virulence potential of pathogen populations and guide the deployment of appropriate plant genes and genotypes for disease control.

    Although not all effectors are proteins (Collemare et al. 2019), the focus of effector biology has been on protein effectors mainly because they can be more easily predicted using informatics analyses than other types of effectors. Characteristics frequently associated with experimentally validated effectors include the presence/absence of specific sequence motifs and functional domains, specialized mode of secretion, differential expression in infected plants, highly variable distribution within species and among closely related species, genomic location/context, and signs of positive selection (Dalio et al. 2018; Jones et al. 2018; Uhse and Djamei 2018). These characteristics have been used to build and improve ML algorithms so that newly sequenced genomes can be systematically mined. Sequence similarity to known effectors and pathogenicity factors has also been used to identify candidate effectors. PHI-base (Urban et al. 2020) archives many pathogen genes experimentally validated to affect plant–pathogen interactions and helps assess whether specific candidate effectors encoded by newly sequenced pathogens likely participate in pathogenesis and how they likely function.

    The presence of one or more of the characteristics noted above does not prove that a newly identified candidate is an effector. However, the accuracy of ML-based effector prediction will continuously increase as the available ML algorithms are frequently retrained using functionally validated effectors. Different versions of EffectorP, a tool for mining fungal and oomycete effectors based on two ML algorithms, illustrate how effector prediction algorithms have evolved (Sperschneider and Dodds 2022; Sperschneider et al. 2015, 2018b). Here, we only briefly highlight notable data-mining pipelines supporting effector discovery and how some of them have been applied to identify and characterize effectors encoded by M. oryzae (Kim et al. 2019, 2020). Several recent reviews extensively covered how such effector-associated characteristics had been discovered and used for mining candidate effectors (Dalio et al. 2018; Jones et al. 2018; Uhse and Djamei 2018).

    Bacterial pathogens employ one or more of three nanotubes, called the type III, IV, and VI secretion systems, to release effectors. This finding helped discover bacterial effectors (Dalio et al. 2018). Systematic discovery of oomycete effectors has taken off following the identification of specific sequence motifs, such as RxLR, EER, and LxLFAK, and structural features, such as the WY domain, associated with many effectors (Wood et al. 2020). Relative to bacterial and oomycete effectors, however, identifying fungal effectors has been more challenging. Because fungal effectors do not seem to share conserved sequence motifs among diverse species, multiple characteristics/criteria have been used to mine candidates.

    Because fungal and oomycete effectors are secreted via the classical secretory pathway, identification of the proteins containing a signal peptide (SP), a 16- to 30-aa-long, N-terminal tag required for cellular sorting of secreted proteins, serves as the initial filter for identifying candidate effectors. However, because fungi also secrete many other proteins to perform functions crucial for their growth, the next step is filtering out those unlikely to be effectors. Secreted proteins containing transmembrane I domains or glycosylphosphatidylinositol (GPI) anchors, which will likely link them to the fungal cell surface, are examples. Multiple tools are available for predicting the presence of SP, TM, and GPI anchors, with SignalP (Almagro Armenteros et al. 2019), TMHMM (Krogh et al. 2001), and PredGPI (Pierleoni et al. 2008), respectively, having been most frequently used. The recently released NetGPI (Gíslason et al. 2021) more accurately identifies GPI-anchored proteins. Because of positive selection presumably caused by the evolutionary “arms race” with hosts, effector genes appear to have undergone faster sequence changes than housekeeping genes and are often absent or amplified in some populations and species (Kang et al. 1995; Khang et al. 2008; Kim et al. 2019). Such genes can be identified by comparing the genomes of closely related species and diverse strains within species (Kim et al. 2019). Because effector gene expression is likely to be induced during host infection (Selin et al. 2016), RNA-sequencing data from plants inoculated with a pathogen of interest can help identify its effectors.

    Besides EffectorP, the following tools have supported effector discovery: EffectR, an R package for predicting candidate oomycete RxLR and CRn (CRinkling and Necrosis) effectors (Tabima and Grünwald 2019); EffHunter, a tool for predicting fungal effectors (Carreón-Anguiano et al. 2020); FunEffector-Pred, a fungal effector predictor based on multiple ML algorithms (Wang et al. 2020a); and Predector, a tool for predicting and ranking candidate effectors (Jones et al. 2021). As the number and diversity of experimentally validated effectors increases, the accuracy of these tools will improve, allowing accurate mining of candidate effector genes from rapidly increasing pathogen genome sequences. Secreted effectors have been shown to function in the apoplast and almost all subcellular compartments of infected plant cells, including the plasma membrane, cytoplasm, nucleus, chloroplast, endoplasmic reticulum, Golgi complexes, and mitochondria. Some tools predict the subcellular localization of effectors, which can help in the investigation of their mode of action (Dalio et al. 2018). ApoplastP helps predict effectors and plant proteins working in the apoplast (Sperschneider et al. 2018a), whereas LOCALIZER predicts plant and effector protein localization to chloroplasts, mitochondria, and nuclei (Sperschneider et al. 2017). DeepLoc, which uses an algorithm based on deep neural networks to predict protein subcellular localization (Almagro Armenteros et al. 2017), can complement ApoplastP and LOCALIZER.

    A recent study illustrates how informatics and experimental approaches can be combined to identify and characterize fungal effectors (Kim et al. 2020). Candidate nuclear effectors encoded by M. oryzae were initially identified based on the copresence of an SP and DNA-binding domain(s). Among 1,895 proteins that contain an SP but lack TM domains and GPI anchors, 20 proteins carry a DNA-binding domain, with 50% of them possessing the C2H2 zinc finger (IPR007087). Three programs, including WoLF-PSORT (Horton et al. 2007), NLStradamus (Nguyen Ba et al. 2009), and cNLS mapper (Kosugi et al. 2009), were used to detect potential nuclear localization signals (NLSs) in the 20 proteins, which predicted one or more NLS in 16 proteins. A multipronged characterization of two candidate genes, including gene disruption, analyzing in planta expression patterns, measuring DNA-binding activity, identifying likely target sequences of DNA binding, and transgenic expression in rice, revealed that both effectors function as transcription repressors to disrupt the expression of many immunity-associated genes.

    Metagenomics helps develop crop protection strategies based on microbiome function.

    Plant phenotypes are not determined just by their genome. Besides varied environmental conditions, diverse microbes associated with the plant rhizosphere, phyllosphere, and endosphere affect host growth, development, and fitness (Bonito et al. 2014; Clouse and Wagner 2021; Trivedi et al. 2020; Vandenkoornhuyse et al. 2015). Some of these microbes are vital for plant fitness in marginal ecosystems (Rodriguez and Redman 2008; Tedersoo et al. 2020), suggesting that the genomes of plant-associated microbes function as part of the plant pan-genome (Gopal and Gupta 2016). Metabarcoding based on sequences of one or more phylogenetically informative loci, typically the internal transcribed spacer (ITS) regions of the rRNA-encoding genes for fungi and the 16S rRNA gene for bacteria, has been used to analyze the structure of microbiomes associated with diverse plants and surrounding environments (Fadiji and Babalola 2020; Liu et al. 2021). Sequences of such loci, amplified with DNA extracted from individual samples as templates (Fig. 3), reveal the diversity and relative abundance of microbes, show how they change under different conditions, help determine which microbes are keystone species holding the community, and assist in identifying microbial taxa potentially crucial for specific plant phenotypes (e.g., disease resistance, improved growth). However, because metabarcoding data are insufficient for confirming candidate microbes’ significance in a specific phenotype and do not reveal the mechanism underpinning their effect on plants or other microbes, other tools and analyses are needed.

    FIGURE 3

    FIGURE 3 Flowchart depicting data generation/processing steps and commonly used informatics tools for metabarcoding analyses. The green boxes denote the steps required to prepare amplicons (e.g., internal transcribed spacer [ITS] and 16 rDNA) for sequencing. The resulting raw sequence data are processed (blue and purple boxes) before performing specific analyses (yellow boxes). An informatics pipeline such as Mothur, QIIME2, or DADA2 is used to clean and organize raw sequence data to generate a table of operational taxonomic units (OTUs)/exact sequence variants (ESVs) identified (blue box). Taxonomic information is assigned to individual OTUs/ESVs using appropriate sequence-based taxonomy databases. The OUT/ESV table, taxonomy table, and metadata are combined to create a phyloseq (an R package for microbiome analysis) object (purple box). The resulting phyloseq object is used to perform various analyses (yellow boxes). Several frequently performed analyses are noted, with each box denoting the type of analysis (bolded) and commonly used tools (data analysis + visualization in parentheses).

    Download as PowerPoint

    Data from experimentally characterized microbes serve as references in deducing the biology of those identified via metabarcoding. Although a close phylogenetic relationship does not necessarily indicate a high degree of phenotypic similarity, two microbes with identical or highly similar sequences at one or more loci likely share many similarly working genes and processes. Informatics tools like PICRUSt (Langille et al. 2013) and Tax4Fun (Aßhauer et al. 2015; Wemheuer et al. 2020) have been used to predict functional properties of bacterial communities by matching identified 16S rRNA sequences to functionally annotated bacterial genomes. Although there are no equivalent tools available for predicting functions of fungal communities, ecological guilds of fungi (i.e., groups of species that exploit/depend on the same resources/ecological niches) can be predicted using FunGuild (Nguyen et al. 2016). Shotgun metagenome sequencing (deeply sequencing whole DNA extracted from collected samples) helps assemble partial genomes of many microbes; the resulting data can help predict their likely functions and roles by revealing their gene products and biochemical/regulatory pathways (Liu et al. 2021; Quince et al. 2017). Analyses of metatranscriptomes, metaproteomes, and metametabolomes in complex samples (Noecker et al. 2016; Pétriacq et al. 2017) can also provide many clues to how individual microbiomes and their constituents perform specific functions under different conditions (Kim et al. 2021).

    Isolation of specific microbes helps validate their role and characterize how they interact with other microbes and plants. Culturing most microbial species is expected to be quite challenging (Stewart 2012) because tailored growth conditions are often needed. However, culture media and growth conditions formulated based on an improved understanding of microbial biology and ecology and instruments specialized for isolating individual microbial cells have helped isolate strains representing diverse taxa recognized only through their DNA sequences (Sarhan et al. 2019). A study aimed to identify rhizosphere bacteria crucial for tomato resistance against Ralstonia solanacearum, the causal agent of bacterial wilt, demonstrates how some of these approaches can be combined to investigate the mechanism of resistance against this pathogen (Kwak et al. 2018). They assembled the genome of one of the candidate species identified via metabarcoding using shotgun metagenome sequencing data. The genome data helped culture one candidate by informing its nutritional requirement. The resulting culture allowed them to confirm the significance of this candidate in protecting tomato plants from R. solanacearum.

    The following examples illustrate how metabarcoding analysis helped understand how soil microbes contribute to disease suppression. The diversity of microbes associated with plant roots is immense and affects plant health in many ways (Bonito et al. 2014). Because plants actively build and modulate their root microbiomes by secreting a diverse array of proteins and metabolites (Pascale et al. 2020), the composition of surrounding soil microbiomes likely influences the resulting root microbiome and consequently plant growth and health. Soils suppressive to soilborne diseases, attributed to the presence of microbes antagonistic to pathogens, have been known for several decades (Cook 2014; Weller et al. 2002). However, systematic inquiries into the underlying mechanism have increased rapidly only after NGS technologies became widely available. A comparative soil microbiome analysis between the areas showing soybean sudden death syndrome (SDS) and those with healthy plants in the same field revealed significant differences in their bacterial and fungal communities (Srour et al. 2017), suggesting that the relative abundance of multiple microbial taxa is critical for SDS occurrence/severity. Similar conclusions were made after comparing the microbiomes in bacterial wilt–suppressive and –conducive soils (Li et al. 2021; Zheng et al. 2021) and analyzing other disease-suppressive soils (Cha et al. 2016; Mendes et al. 2011; Santhanam et al. 2015; Trivedi et al. 2017).

    Studies on how cultural practices (e.g., crop rotation, soil amendment, biocontrol, fertilizer application, fumigation) and environmental conditions (e.g., increased CO2, soil salinity, drought) affect the composition of soil/plant microbiomes have been increasing. Insights from such studies can guide targeted manipulations of plant-associated microbiomes to improve crop health (French et al. 2021; Taş et al. 2021). One approach is applying synthetic microbial communities (SynComs) formulated based on microbiome function in suppressing diseases. Modified soil microbiomes using SynComs can suppress diseases by directly inhibiting pathogens, enhancing plant immunity, or both (Hu et al. 2016, 2017; Zhang et al. 2020). Identification of keystone taxa, defined as the taxa that exert, individually or as a guild, a significant influence on microbiome structure and function irrespective of their abundance (Banerjee et al. 2018), is crucial in constructing SynComs (Agler et al. 2016). Microbial network analyses (Fig. 3) have been used to identify keystone species. Application of three keystone Pseudomonas species isolated from soil strongly suppressed R. solanacearum and reduced bacterial wilt incidence in tobacco (Zheng et al. 2021).

    Cultural practices can also be judiciously used to modify indigenous soil microbiomes in ways beneficial to crops. Residue retention, no-till management, crop rotation, biochar, and compost addition have been shown to induce general or specific disease suppression by affecting microbiome composition, presumably through the increased availability of carbon (Peralta et al. 2018; Wang et al. 2020b). No-till management increased the abundance of beneficial organisms such as Bradyrhizobium and Glomeromycotina (Longley et al. 2020). Microbes enriched in no-till soils can improve nutrient availability and protect the host from pathogens (Srour et al. 2020). Ecological guilds featuring arbuscular mycorrhizae, nematophagous fungi, and mycoparasites were favored in no-till soils, whereas fungal saprotrophs and plant pathogens dominated in tilled soils. Cover cropping and no-till increased the diversity of soil fungal communities and the symbiotroph/saprotroph ratio (Schmidt et al. 2019). Crop rotation influenced soil bacterial community composition in a corn/soybean system (Chamberlain et al. 2020). Organic amendments combined with fumigation helped control tomato diseases caused by multiple pathogens, including R. solanacearum and Fusarium oxysporum (Deng et al. 2021). Microbial community shift caused by biochar application has been shown to enhance plant health (Jenkins et al. 2017; Li et al. 2020; Wang et al. 2020b). Another application is using microbiome profiles as indicators for assessing soil health. A continental-scale analysis of Fusarium wilt–suppressive soils suggested bacteria in the phyla Actinobacteria and Firmicutes as potential indicators for Fusarium wilt suppressiveness (Trivedi et al. 2017). Attempts to manipulate the microbiomes associated with other parts of plants (e.g., endophytes, phyllosphere microbes) via breeding and genetic engineering are also increasing (Clouse and Wagner 2021; Gopal and Gupta 2016). One potential advantage of targeting such microbes to enhance crop health is that, because they can effectively and stably colonize target plants or tissues, they would be less prone to variation caused by environmental conditions, thus potentially offering more durable benefits.

    Microbiome function–based strategies have demonstrated their potential as pesticide alternatives. However, more studies, especially those that will help improve their effectiveness and reliability under diverse conditions, are required to convert this potential to field-deployable solutions. Like biocontrol, many biotic and abiotic factors likely affect the efficacy of microbiome-based crop protection strategies. Because many such factors are often idiosyncratic in different fields and cropping systems, it is unlikely that one-size-fits-all strategies can reliably work for diverse crops and production systems. Customized strategies founded on a comprehensive understanding of how various factors affect plant microbiomes and their function will be needed. Given the involvement of many microbes, investigating the mechanism of microbiome-based plant health protection is expected to be challenging. Besides, because some strategies (e.g., those based on cultural practices) will probably affect agronomic traits, we will also need to research the nature and mechanism of such effects. Despite these challenges, as efforts to understand how the microbiomes associated with diverse cropping systems affect crop production rapidly expand, our ability to develop effective and reliable strategies will improve (Clouse and Wagner 2021; Song et al. 2021). Informatics tools play essential roles in advancing this understanding by helping analyze the structure, dynamics, and likely function of soil/plant microbiomes.

    Many informatics pipelines and software are available to support metabarcoding data processing and analysis, result visualization, and data sharing (Bharti and Grimm 2021; Liu et al. 2021). New tools and improved versions of existing tools have been frequently released to support rapidly expanding microbiome analyses and extract more or better information from the resulting data, and the trend will continue. GitHub, a web-based provider of code hosting/sharing services that has been supporting collaborative software development, provides many tools for microbiome analyses and user tutorials. Because many reviews (e.g., Bharti and Grimm 2021; Liu et al. 2021) and online user guides detailing how to conduct microbiome analyses are available, we only briefly note commonly used resources without covering technical details.

    Raw sequence data must be processed to create a table of operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) needed to perform various statistical analyses (Fig. 3). The processing typically includes the following: inspecting read quality profiles, grouping paired-end reads based on index sequences (demultiplexing), removing chimeric sequences and sequences corresponding to the indexes and primers used, merging the reads in each group to obtain amplicon sequences, and clustering the processed sequences into OTUs or ASVs to select representative sequences as species deputies/proxies (Liu et al. 2021). Mothur (Schloss et al. 2009), Quantitative Insights into Microbial Ecology 2 (QIIME2) (Bolyen et al. 2019), and DADA2 (Callahan et al. 2016) are frequently used pipelines for data processing (Liu et al. 2021).

    The assignment of taxonomic information relies on multiple reference databases such as SILVA (Quast et al. 2013), Ribosomal Database Project (RDP) (Cole et al. 2005), UNITE (Kõljalg et al. 2005), Genome Taxonomy Database (GTDB) (Chaumeil et al. 2019), ARBitrator-format nifH database (Gaby and Buckley 2014), and PR2 (Guillou et al. 2013). The coverage of microbial taxa and the number of loci available in these databases vary. Some focus on a specific group of microbes (e.g., the nifH database for identifying nitrogen-fixing bacteria using aligned nitrogenase nifH sequences and PR2 for helping identify protists using 18S rRNA sequences). SILVA and RDP archive sequences of the rRNA-encoding genes covering all three domains of life (Bacteria, Archaea, and Eukarya), whereas UNITE archives eukaryote ITS sequences. The GTDB is unique because it archives ∼260,000 genomes as taxonomic references to support bacterial and archaeal classification. The archived genomes in GTDB were recently organized into clusters based on the average nucleotide identity to the genomes representing known species to provide an improved taxonomic framework (Parks et al. 2020). This data reorganization was compelled because ∼40% of the genomes in GTDB lack a species name (i.e., they correspond to taxa that have not yet been characterized and described). All these databases are frequently updated to increase the accuracy of taxonomic identification, which is crucial to support robust downstream analyses.

    When the raw sequence data have been processed, a suite of statistical analysis tools (mostly available in the form of R packages) such as Phyloseq (McMurdie and Holmes 2013), Vegan (Oksanen et al. 2007), and metagenomeSeq (Paulson et al. 2013) can be employed to make inferences on microbial abundance and α/β diversity in different samples. Several tools help test the ecological/functional significance of microbial associations within individual microbiomes via network inference based on pairwise correlations between OTUs or ASVs. They include SParse InversE Covariance Estimation for Ecological Association Inference (SpiecEasi) (Kurtz et al. 2015), Sparse Correlations for Compositional data (SparCC) (Friedman and Alm 2012; Watts et al. 2019), FlashWeave (Tackmann et al. 2019), CCREPE (Gevers et al. 2014), and Molecular Ecological Network Analysis Pipeline (MENAP) (Deng et al. 2012). The resulting networks can be visualized using a tool like Gephi (Bastian et al. 2009). Other packages such as ggplot2 (Wickham 2011) and Ggally (Schloerke et al. 2018) can also help visualize analysis results.


    Harnessing the vast resources and opportunities genomics presents to understand and protect crop health requires more than continuously generating new data and tools. We should also build community support infrastructures to empower diverse stakeholders. Examples include mechanisms/tools for preserving accumulated data and materials in a format that can effectively support future work and platforms assisting genomics-based inquires and problem-solving by many who cannot program for large-scale data mining/analysis. A mechanism for archiving diverse experiences from previous studies is vital because the growth of science relies on not only accumulated data but also such experiences. However, compared with preserving data, systematically preserving experiences in a format that can be widely shared and updated has received inadequate attention. Experience-based knowledge, often not adequately captured in publication, cannot effectively help others without such a mechanism. Resources that can help meetssentiareasing need for (re)education and public outreach are also crucial.

    Crowdsourcing or open innovation, defined as mass collaboration via the web, can help meet these needs. Rapid advances in web technology enabled people to collaborate without physical interactions, catalyzing disruptive innovations in many sectors (Kang 2014). As noted above, open-source platforms like GitHub have played vital roles in collaborative software development via code sharing and archiving various informatics tools and tutorials. Crowdsourcing can accomplish more than software development. Thanks to the rapidly increasing data transmission capacity and the availability of assorted mobile platforms, crowdsourcing applications have been rapidly evolving. One notable example is a growing list of citizen science projects, those involving the public in research (see >2,000 projects listed at Below, we briefly discuss additional applications of crowdsourcing to advance genomics-driven research and problem-solving.

    A virtual library of editable protocols and data collection/analysis standards.

    Well-curated and widely shared protocols for laboratory and field experiments provide multiple benefits. Newcomers can quickly acquire basic technical competency through these protocols rather than toiling through trial and error. Widely shared protocols also make comparing and integrating results from different studies more manageable and accurate. Given the fast growth in large-scale gene functional analyses, we cannot underestimate the importance of systematically collecting phenotypic data in ways that will facilitate accurate comparison and integration of results from different experiments. Standard operating procedures (SOPs) and agreed-upon product standards are considessentialntial in most industrial sectors. Individual sectors also have a mechanism that guides procedure/standard development and update. However, in biology, multiple protocols exist for individual experiments, often employing different conditions and data collection schemes, because of preferences or prior experiences of individual researchers. Divergence in how we collect data likely confuses newcomers and can potentially fragment community research. Because the amount and complexity of data from plant health-related research will continuously increase, we should establish widely shared SOPs and data collection/analysis standards to better link research outputs. Of course, this does not mean that everyone should perform the same phenotypic assays under identical conditions and use the same instruments. We advocate for a shared standard for data collection to facilitate data/knowledge sharing and integration. The rationale and potential pitfalls underlying individual steps of individual protocols, detailed guides for data collection and analysis, and user experiences should be preserved to help others understand the science and evolution of individual protocols. A quality control mechanism with a question-and-answer function will be needed to ensure robust growth. There have been attempts to host easily accessible and modifiable protocol libraries and other resources. Zenodo, a knowledge depository for diverse communities (, and, a platform designed to help develop and share reproducible methods (, illustrate how such portals operate.

    Resource for (re)education and global capacity building.

    For many people on the frontlines of crop protection, such as extension educators and diagnosticians, job objectives are unlikely to change much despite rapid advances in many areas of science and enabling technologies. Although rapidly increasing new data and tools can help them, such users often lack the time and resources to immerse themselves in rapidly evolving fields like genomics and informatics. An efficient mechanism is needed to deliver information and support whenever and wherever they require it. Besides enhancing educational support for existing plant health professionals, this mechanism will help more effectively respond to new global food security threats. Because the global connectedness of agricultural trade and production systems facilitates pathogen/pest movement, human capacity building should be global in scale. Otherwise, the data and tools that have been generated will not be used as effectively as they could be to recognize and control emerging problems. The following examples illustrate how online portals can coordinate and support efforts to ensure sustainable and environment-friendly crop production. PlantVillage ( assists African smallholder farmers in increasing crop yield and profitability and adapting to climate change by providing advice based on crop type, location, and planting date via multiple channels. Open Wheat Blast ( functions as a hub for knowledge sharing and collaboration to understand and control wheat blast, a recently emerged disease threatening global wheat production.


    Insightful suggestions from three reviewers greatly helped improve the review.

    The author(s) declare no conflict of interest.


    Funding: We acknowledge support from the USDA Specialty Crop Multi-State Program (AM170200XXXXG006), the USDA National Institute of Food & Agriculture and Federal Appropriations (project number 1016291), the Brian Pool Program of the National Research Foundation of the Republic of Korea (grant 2019H1D3A2A01054562), and a National Research Foundation grant (2021R1G1A1094780).

    The author(s) declare no conflict of interest.