APS Online Publications
PERSPECTIVEOpen Access icon OPENOpen Access license

Community-Driven Metadata Standards for Agricultural Microbiome Research

    Affiliations
    Authors and Affiliations
    • J. P. Dundore-Arias1
    • E. A. Eloe-Fadrosh2
    • L. M. Schriml3
    • G. A. Beattie4
    • F. P. Brennan5
    • P. E. Busby6
    • R. B. Calderon7
    • S. C. Castle8
    • J. B. Emerson9
    • S. E. Everhart10
    • K. Eversole11
    • K. E. Frost12
    • J. R. Herr13
    • A. I. Huerta14
    • A. S. Iyer-Pascuzzi15
    • A. K. Kalil16
    • J. E. Leach17
    • J. Leonard18
    • J. E. Maul19
    • B. Prithiviraj20
    • M. Potrykus21
    • N. R. Redekar22
    • J. A. Rojas23
    • K. A. T. Silverstein24
    • D. J. Tomso25
    • S. G. Tringe26
    • B. A. Vinatzer27
    • L. L. Kinkel28
    1. 1California State University Monterey Bay, Biology and Chemistry, 100 Campus Center, Seaside, CA 93955, U.S.A.
    2. 2Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, U.S.A.
    3. 3University of Maryland School of Medicine, Department of Epidemiology and Public Health, Institute for Genome Sciences, Baltimore, MD, U.S.A.
    4. 4Iowa State University, Department of Plant Pathology, 207 Science I, Ames, IA 50011-3211, U.S.A.
    5. 5Teagasc, Department of Environment Soils and Land-Use, Wexford, Johnstown Castle, Ireland
    6. 6Oregon State University, Department of Botany and Plant Pathology, Corvallis, OR, U.S.A.
    7. 7Louisiana State University, Department of Plant Pathology and Crop Physiology, Baton Rouge, LA, U.S.A.; and Benguet State University, Benguet, La Trinidad, Philippines
    8. 8U.S. Department of Agriculture-Agriculture Research Service, Plant Science Research Unit, Saint Paul, MN, U.S.A.
    9. 9University of California Davis, Department of Plant Pathology, Davis, CA, U.S.A.
    10. 10University of Nebraska, Plant Pathology, 406G PLSH 1875 N 38th St., Lincoln, NE 68510, U.S.A.
    11. 11International Alliance for Phytobiomes Research, Lee’s Summit, MO, U.S.A.
    12. 12Oregon State University, Botany and Plant Pathology, 2121 S 1st St., Hermiston, OR 97838, U.S.A.
    13. 13University of Nebraska-Lincoln, Department of Plant Pathology and Center for Plant Science Innovation, 422 Plant Sciences Hall, Lincoln, NE 68588, U.S.A.
    14. 14North Carolina State University College of Agriculture and Life Sciences, Department of Entomology and Plant Pathology, Raleigh, NC, U.S.A.
    15. 15Purdue University, Department of Botany and Plant Pathology and Center for Plant Biology, West Lafayette, IN, U.S.A.
    16. 16Williston Research Extension Center, Williston, ND, U.S.A.
    17. 17Colorado State University, Bioagricultural Sciences and Pest Management, 1177 Campus Delivery, Fort Collins, CO 80523-1177, U.S.A.
    18. 18Louisiana State University, AgCenter and Department of Plant Pathology and Crop Physiology, Baton Rouge, LA, U.S.A.
    19. 19U.S. Department of Agriculture-Agricultural Research Service Administrative and Financial Management, Sustainable Agricultural Systems Laboratory, Beltsville, MD, U.S.A.
    20. 20Brooklyn College of the City University of New York, Department of Biology, Brooklyn, NY, U.S.A.
    21. 21Medical University of Gdansk, Department of Environmental Toxicology, Gdansk, Poland
    22. 22Oregon State University, Crop and Soil Science, 109 Crop Science Building, 3050 SW Campus Way, Corvallis, OR 97331-4501, U.S.A.
    23. 23University of Arkansas Fayetteville, Plant Pathology, 495 N. Campus Drive PTSC 217, Fayetteville, AR 72703, U.S.A.
    24. 24Minnesota Supercomputing Institute for Advanced Computational Research, Minneapolis, MN, U.S.A.
    25. 25AgBiome, 104 TW Alexander Drive Building 1, RTP, NC 27709, U.S.A.
    26. 26Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, U.S.A.
    27. 27Virginia Polytechnic Institute and State University, PPWS, Latham Hall AgQuad Lane Room 551, Blacksburg, VA 24061-0131, U.S.A.
    28. 28University of Minnesota, Plant Pathology, 495 Borlaug Hall 1991 Upper Buford Circle, Saint Paul, MN 55108, U.S.A.

    Abstract

    Accelerating the pace of microbiome science to enhance crop productivity and agroecosystem health will require transdisciplinary studies, comparisons among datasets, and synthetic analyses of research from diverse crop management contexts. However, despite the widespread availability of crop-associated microbiome data, variation in field sampling and laboratory processing methodologies, as well as metadata collection and reporting, significantly constrains the potential for integrative and comparative analyses. Here we discuss the need for agriculture-specific metadata standards for microbiome research, and propose a list of “required” and “desirable” metadata categories and ontologies essential to be included in a future minimum information metadata standards checklist for describing agricultural microbiome studies. We begin by briefly reviewing existing metadata standards relevant to agricultural microbiome research, and describe ongoing efforts to enhance the potential for integration of data across research studies. Our goal is not to delineate a fixed list of metadata requirements. Instead, we hope to advance the field by providing a starting point for discussion, and inspire researchers to adopt standardized procedures for collecting and reporting consistent and well-annotated metadata for agricultural microbiome research.

    Advances in sequencing technologies coupled with declining sequencing costs have resulted in dramatic increases in the volume of sequence-based microbiome data generated for microbial communities across diverse environments. In agricultural and natural habitats, amplicon sequencing, metagenomics, and other omics datasets have revealed the extraordinary diversity of plant and soil microbiomes, and provided insight into variation in microbiome composition and function in relation to crop host, environment, and productivity across diverse settings. Collectively, these data have provided compelling illustrations of the critical roles that microbiomes play in plant and ecosystem health (Bodelier 2011; Finkel et al. 2017; Hooper et al. 2012; Jacoby et al. 2017; Sarkar et al. 2017; Vonaesch et al. 2018).

    There is tremendous enthusiasm among agricultural researchers for integrating data and research findings across diverse cropping systems and management settings as a means for identifying generalizable principles of microbiome assembly and dynamics (Busby et al. 2017; Erlich and Narayanan 2014). In order to develop robust predictions of functional agronomic potential, comparative and synthetic (e.g., synthesis or integration of data across experiments or research systems) data analyses are needed. However, despite the widespread and expanding availability of crop microbiome data, variation in sampling, processing, and analytical methodologies, as well as in metadata collection and reporting constrains our capacity to integrate data across research laboratories and projects.

    Variation among researchers in methodological techniques and study details has always represented a constraint to straightforward integration of data among research studies. Given the diversity of approaches for sample processing (e.g., DNA and RNA extraction and storage, sequencing technologies, and data processing algorithms) and the significant impacts of these methods on the resulting data, seamless integration of sequencing data across studies without a priori agreement on sampling, processing, and data analysis strategies is both challenging and inefficient. Moreover, attempting to establish a single methodology may in some cases limit the potential for researchers to optimize their design to test their primary hypotheses. However, the potential for comparative (rather than integrative) analyses across studies that may have variable processing methods is still possible with the establishment and use of consensus terminology and metadata standards in agricultural microbiome research.

    Metadata are critical to providing the necessary context and description of the what, where, when, and how of the microbiome sample and play a critical role in enhancing the value of the sequencing data (Cole et al. 2010; Huttenhower et al. 2014; Jones et al. 2006). Among microbiome scientists, agricultural researchers are especially likely to benefit from standardized metadata reporting a consensus sampling, data processing, and analytical pipelines. In particular, many agricultural researchers focus on a small number of important food crops across a wide variety of geographic locations, habitats, and management conditions, resulting in many varying datasets for the same plant host. Such datasets create opportunities for comparative studies aimed at understanding the potential generalizability of findings, and the robustness of microbiome compositional and functional dynamics in relation to plant performance across diverse production systems. Capturing the knowledge of large-scale processes and underlying trends from our collective agricultural microbiome data will require such comparative analyses and, where possible, integration of existing datasets and research findings from across many laboratories. This will only be possible with standardized metadata. Finally, it is important to note that agricultural researchers are likely to require metadata types that are not commonly included in microbiome datasets, including metrics related to crop productivity, management practices, weather conditions, and cropping history.

    Here we advance the discussion of agriculture-specific metadata for microbiome research by proposing a first draft of metadata standards for agricultural plant and soil microbiomes. This contribution comes out of a summer 2018 workshop (Boston, MA) titled “Agricultural Microbiome Data: Data Platforms, Standards, and Analytical Tools Needed for Advancing our Science.” This workshop was co-organized by the National Science Foundation Agricultural Microbiomes Research Coordination Network (AgMicrobiomes RCN; https://agmicrobiomercn.umn.edu/) and leaders of both the Genomic Standards Consortium (GSC) (https://press3.mcs.anl.gov/gensc/; Yilmaz et al. 2011b) and the National Microbiome Data Collaborative (https://microbiomedata.org/). This workshop brought together researchers with expertise in plant pathology, soil science, agronomy, microbial ecology, bioinformatics, and data science to propose a preliminary list of metadata fields toward building metadata standards specific for the agricultural microbiomes research community, and to propose defined controlled vocabulary associated with specific ontologies. The workshop built upon initial efforts arising from the Phytobiomes Initiative of The American Phytopathological Society (http://www.phytobiomes.org) and a subsequent 2016 workshop organized by the International Alliance for Phytobiomes Research (http://www.phytobiomesalliance.org) and the U.S. National Institute of Standards and Technology to explore priorities, standards, protocols, references materials, and reference data for phytobiomes research.

    BUILDING UPON EXISTING METADATA STANDARDS

    Agricultural microbiome researchers are not starting from scratch with regard to establishing standards for metadata. Multiple environmental metadata databases have already been established including for example the Genomes OnLine Database (GOLD) (Mukherjee et al. 2017) and the National Center for Biotechnology Information’s BioSample Database (https://www.ncbi.nlm.nih.gov/biosample). The GSC pioneered and coordinated the generation of community-driven standards for collecting and managing relevant contextual information associated with genomic data (Field et al. 2008). Starting over a decade ago, the GSC community established minimum information (MIxS: minimum information about any [x] sequence) standards for describing sequencing data and the associated sample environment, including defining specific parameters for sample description, as well as the documentation of analytical approaches (Glass et al. 2014; Yilmaz et al. 2011a). The MIxS standards consist of checklists for describing minimum information about marker genes (MIMARKS), genomes (MIGS), and metagenomes (MIMS), and 15 environmental packages to enable the standardized description of environmental and host-associated contextual data specific to distinct environments (e.g., air, soil, water, and sediment), hosts (e.g., humans and plants), and tissues (e.g., human gut, oral cavity, and skin) (Field and Sansone 2006; Taylor et al. 2008). Similarly, the International Human Microbiome Standards project (IHMS) (http://www.microbiome-standards.org) was established to promote development and implementation of standard operating procedures, including sample identification, collection, and processing, needed to optimize data quality and comparability in human microbiome research. Of more particular relevance to agricultural researchers, the Terragenome International Soil Metagenome Sequencing Consortium (Vogel et al. 2009) established soil data types and standards for describing soil features relevant to soil biology research. The consortium established controlled vocabulary and simple definitions to help researchers to query and retrieve soil metagenome data using project contextual data (Cole et al. 2010). Through collaboration with the GSC, this vocabulary and definitions were incorporated into the environmental metadata packages within MIxS, including for the MIxS soil- and plant-associated checklist that were subsequently endorsed and promoted by the Earth Microbiome Project (Thompson et al. 2017). More recently, a group of scientists from the U.S. Department of Agriculture-Natural Resources Conservation Service published a list of recommended soil health indicators and associated methods appropriate for high-throughput soil test laboratories (Technical Note No. 450-03; Stott 2019). This information was also adopted by the Soil Health Institute for developing methods for evaluating soil health indicators at a continental scale (https://soilhealthinstitute.org/north-american-project-to-evaluate-soil-health-measurements). While genomics is included in their Tier 2 checklist (described as “additional research is needed before users can have the same level of confidence in its measurement, use, and interpretation than Tier 1”), no other microbiome-related indicators have been incorporated today.

    To make microbiome data across research studies and data repositories more tractable and easier to interpret, it is paramount to define a controlled vocabulary, or ontology, encompassing the semantics, properties, and relationships of a particular domain (Huss 2014). Standard terminology and clear definitions are of critical importance to reduce the siloed nature of microbiome research, and to prevent confusion regarding the use and interpretation of critical terms. For example, Huss (2014) noted the potential conflict in the dual usage of the term microbiome, when used with the intention to refer to the collective microbiota genome of a host organism, versus the ecological interpretation of the term as a biome of microbes, as in ecology. In other cases, the use of synonyms or imprecise terms for metadata annotation can obscure the intended description or hinder data retrieval from existing repositories (Huttenhower et al. 2014). In agricultural microbiome research, such challenges are raised when terms are used interchangeably or without providing a specific working definition to refer to the origin of the microbiome of interest. Some common examples are as follows: rhizosphere, rhizoplane, or root-associated; root-endophytes versus root-compartmentalized; and soil, bulk soil, or potting soil.

    The Environmental Ontology (EnvO) (Buttigieg et al. 2013) and the EMP Ontology (EMPO) (Thompson et al. 2017) are ontologies used to describe microbiome data based on description of environments linked to biological specimens, samples and observations. Similarly, the Open Biological and Biomedical Ontology (OBO Foundry) (Smith et al. 2007) archives various plant-related ontologies used to describe plant attributes that are relevant to plant science experiments, including plant traits and phenology, as well as agronomic practices. However, despite the existence of multiple sequencing platforms and databases types, the agricultural microbiome research community lacks consensus metadata standards and defined ontologies integrating agricultural and microbiome-specific research principles.

    Following the example of the MIxS soil- and MIxS plant-associated checklists, participants in the AgMicrobiomes RCN workshop sought to identify the minimum information and categories to be included in a standard metadata checklist for agricultural microbiome studies (Table 1). Discussion focused on identifying metadata that should be required (essential for the dataset) versus desirable (important, but either more difficult or expensive to measure or irrelevant in some contexts). Participants of the workshop recognized the tension between the costs of collecting, reporting, and storing metadata versus the value of additional metadata for research synthesis, and were challenged to limit required metadata.

    TABLE 1 Proposed required and desirable metadata categories to be included in a future minimum information metadata standards checklist for describing agricultural microbiome studies (MIxS-Ag)

    Although our efforts are extensions of metadata standards developed for and by other microbiome research communities, our proposed checklist of required and desirable (but optional) metadata standards addresses the unique needs of the agricultural microbiome research community. Our proposed checklist deviates from currently available resources by integrating plant, soil, field, and climate metadata, and incorporating metrics specific to cropping systems, productivity, and management practices. Moreover, our checklist highlights the importance of including categories for sampling protocols, processing, and storage for both environmental samples and sequenced materials.

    TOWARD THE ESTABLISHMENT OF AN AGRICULTURALLY FOCUSED METADATA STANDARD

    This manuscript is intended to stimulate discussion and move our community toward standardized reporting of metadata, sampling, processing, and analytical pipelines in agricultural microbiome research. The next step is to develop, along with members of the GSC, a MIxS-Ag metadata standard and ontology that will be incorporated into the GSC MIxS collection and released to other commonly used data management platforms and repositories. Further efforts should also focus on developing mechanisms and incentives that promote the use of these standards within our field. With the goal of stimulating discussion around the topic and its content, the MIxS-Ag will be presented at various national and international conferences, offering multiple open Q&A and annotation sessions. During the development of the MIxS-Ag, we will seek feedback and endorsement from agricultural microbiome researchers representing diverse public and private sectors (e.g., the International Alliance for Phytobiomes Research, the Sustainable Innovation of Microbiome Applications in the Food System [SIMBA] project, the Soil Health Partnership, among others). Moreover, the AgMicrobiomes RCN will work with core researchers in our community to produce and deposit MIxS-Ag biosample datasets. Journals publishing agricultural microbiomes research and scientific funding agencies can also support these efforts by requiring metadata compliance with community standards. We have already established connections with editorial boards from relevant scientific journals regarding the establishment of a policy that promotes the use of the MIxS-Ag metadata.

    Many factors will influence the further development and evolution of metadata standards for agricultural microbiome research. For example, the rapid rate at which the spectrum of omics technologies and computing and processing infrastructures are advancing suggests the potential need for more thorough documentation of analytical methods and procedures. Similarly, new technologies for collecting environmental and plant phenotypic data in the field are changing the scale, types, and magnitude of data collected, with significant repercussions for metadata collection. Although detailed, highly resolved metadata are desirable, their production and management can be laborious, especially in the absence of automated procedures for incorporation into standardized data formats. Increased granularity of metadata must be coupled with increased abilities to document and export information in machine-readable formats that allow automated data parsing and integration into analytical pipelines for downstream analyses (Cole et al. 2010; Huttenhower et al. 2014). Ultimately, metadata selection principles must focus not only on what we can measure, but also on the biologically relevant scales of space and time for the target microbiome.

    An additional challenge to promoting data sharing, transparency, and comparative analyses is in assuring privacy and confidentiality of sensitive grower or landowner information, as well as proprietary industry data. Data protection measures similar to those used to prevent misuse of human subjects’ data are needed to guarantee secure and efficient sharing of sensitive data in agricultural research studies, particularly those incorporating data from private grower fields. In addition, documentation of data provenance and intellectual property issues are factors that are likely to be faced by our field in the coming years. Finally, we recognize that a flexible rather than a fixed approach to agricultural microbiomes metadata will be needed over the long term. While standardization of protocols and metadata collection and sharing are vital for ensuring that microbiome data are findable, accessible, interoperable, and reusable (FAIR) (Wilkinson et al. 2016), it is important to ensure that standardized procedures do not constrain the development of innovative or optimized approaches that can advance our field. As new sequencing, analytical, and environmental monitoring technologies evolve, our standards must evolve in parallel.

    CONCLUSION

    Here we propose metadata standards for the agricultural microbiome research community. Our goal is not to delineate a strict or fixed list of metadata requirements or defined format for data deposition, but to stimulate conversation to drive the field toward standardization of metadata capture and sharing. The proposed list of metadata standards should serve as a guide and citable tool to assist researchers in evaluating whether their own data, or that of others, contains key information needed to facilitate comparative and synthetic analyses. We anticipate that the proposed metadata standards will evolve in response to community feedback and advances in our field. Moreover, we expect that these standards will incentivize future efforts for consistent metadata collection, sharing, archiving, and retrieval within our community. Finally, it is our hope that by promoting the use of standard terminology and agricultural system metadata, we will accelerate the potential for synthesis and integration of data across research studies, encourage collaboration among agricultural microbiome researchers, facilitate the identification of generalizable principles of crop−microbiome interactions, and advance our capacities to exploit the functional potential of agricultural microbiomes to improve crop production, sustainability, and nutritional quality.

    ACKNOWLEDGMENTS

    We thank The American Phytopathological Society for the support provided in organizing and hosting our workshop during the 2018 International Conference of Plant Pathology. Special thanks go to workshop participants and invited speakers. We also thank Leland Pierson, Ewa Lojkowska, Deborah Samac, Thais Egreja, Paul Schulze-Lefert, Tijana Glavina del Rio, and James Tiedje for their participation in the “Agricultural microbiome data: Data platforms, standards, and analytical tools needed for advancing our science” at the ICCP meeting in Boston, MA. For more information about the AgMicrobiomes RCN, including a complete summary of workshop discussion and outcomes, visit our website https://agmicrobiomercn.umn.edu.

    The author(s) declare no conflict of interest.

    LITERATURE CITED

    The author(s) declare no conflict of interest.

    Funding: This material was supported in part by the National Science Foundation under grant number 1714276. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.