hagis, an R Package Resource for Pathotype Analysis of Phytophthora sojae Populations Causing Stem and Root Rot of Soybean
- Austin G. McCoy1 †
- Zachary Noel1 2
- Adam H. Sparks3
- Martin Chilvers1 2
- 1Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, U.S.A.
- 2Program in Ecology, Evolutionary Biology, and Behavior, Michigan State University
- 3University of Southern Queensland, Centre for Crop Health, Toowoomba, Qld 4350, Australia
Abstract
Phytophthora sojae is a significant pathogen of soybean worldwide. Pathotype surveys for Phytophthora sojae are conducted to monitor resistance gene efficacy and determine if new resistance genes are needed. Valuable measurements for pathotype analysis include the distribution of susceptible reactions, pathotype complexity, pathotype frequency, and diversity indices for pathotype distributions. Previously the Habgood-Gilmour Spreadsheet (HaGiS), written in Microsoft Excel, was used for data analysis. However, the growing popularity of the R programming language in plant pathology and desire for reproducible research made HaGiS a prime candidate for conversion into an R package. Here we report on the development and use of an R package, hagis, that can be used to produce all outputs from the HaGiS Excel sheet for P. sojae or other gene-for-gene pathosystem studies.
Uniform and healthy stand establishment is essential to maximizing soybean (Glycine max) yield. Oomycetes such as Phytophthora sojae constitute a significant threat to stand establishment and yield. Phytophthora sojae has been managed primarily via deployment of single resistance genes in commercial soybean cultivars, which interact with P. sojae Avr gene products to confer resistance (Anderson et al. 2015). Genetic resistance to P. sojae is the most economical form of control for P. sojae, as it confers season-long protection to noncompatible pathotypes (Dorrance et al. 2016). However, P. sojae pathotype surveys need to be regularly conducted to determine shifts in pathotypes over time and to provide recommendations for effective resistance genes. Although state-wide pathotype surveys have been conducted for the past 60 years in the United States, there has been no significant advance in pathotype analysis since the development of the Habgood-Gilmour spreadsheet (HaGiS), written in Microsoft Excel, in 1999 (Herrmann et al. 1999; Kaufmann and Gerdemann 1958).
Phytophthora sojae pathotype surveys monitor the efficacy of soybean resistance genes in relation to one or more P. sojae populations. In doing so, large sets of virulence data are generated, potentially for hundreds of isolates (Dorrance et al. 2016). Using such large datasets within the HaGiS Excel-based program can be cumbersome and time-intensive to transfer the data into and perform analysis. The R statistical programming language (R Core Team 2019) offers the ability to work with large datasets in an easy and efficient manner, without the additional data-entry steps that the HaGiS Excel program requires, while treating the virulence data as read-only, thereby further reducing the chance for errors.
R has become widely used in plant-pathology studies, due to its open-source framework and amenability to conduct reproducible research (Bergna et al. 2018; Duku et al. 2016; Sparks et al. 2011; Wallace et al. 2018). Using an R package for analyzing pathotype survey data can replicate all analyses provided by Excel-based programs. It allows users to create reproducible research and more detailed visualizations, as well as allowing the plant-pathology community to actively contribute to and build upon this code for future studies. For instance, McCoy and Noel (2018) produced R scripts to conduct these analyses, originally performed with HaGiS, which were used to create the hagis R package (McCoy et al. 2019).
For ease of use, the package uses a single argument format, which works in all hagis functions. Users provide their own data in the form of a spreadsheet, CSV, or text file, specifying the proper fields for analysis. Functions are provided to calculate pathotype complexity and summarize the distribution of reactions for each gene tested. Simple, Shannon, Simpson, Gleason, and evenness diversity indices are calculated for the pathotype dataset. Outputs from these analyses are given in publication-ready graphics or tables and can be further modified by the user (Table 1).
Table 1. Example of tabular output from the hagis program, using the summarize_gene() at 60% susceptibility cut-off functiona

R language offers many advantages to Excel-based data analysis, such as reproducibility and user customization. Furthermore, hagis takes advantage of the data.table package (Dowle and Srinivasan 2019) to efficiently handle large datasets, such as those produced through P. sojae pathotype surveys, rapidly and efficiently. Significantly, hagis provides the first development in P. sojae pathotype analysis in 20 years. While hagis was developed to support P. sojae pathotype surveys, it was designed to work with any pathotype analyses of gene-for-gene pathosystems to determine effective resistance genes in management.
The package source code, including the Rmarkdown code for this paper, more information, and instructions on how to use hagis can be found in the Zenodo records database. The package can be downloaded and installed from the Comprehensive R Archive Network (CRAN) website and is released under the MIT license.
Author-Recommended Internet Resources
CRAN website: https://CRAN.R-project.org/package=hagis
data.table CRAN website: https://CRAN.R-project.org/package=data.table
GitHub hagis page: https://openplantpathology.github.io/hagis
Zenodo hagis v3.0.0: https://zenodo.org/record/3378007
The author(s) declare no conflict of interest.
Literature Cited
- 2015. Recent progress in RXLR effector research. Mol. Plant-Microbe Interact. 28:1063-1072. https://doi.org/10.1094/MPMI-01-15-0022-CR Link, ISI, Google Scholar
- 2018. Tomato seeds preferably transmit plant beneficial endophytes. Phytobiomes J. 2:183-193. https://doi.org/10.1094/PBIOMES-06-18-0029-R Link, Google Scholar
- 2016. Pathotype diversity of Phytophthora sojae in eleven states in the United States. Plant Dis. 100:1429-1437. https://doi.org/10.1094/PDIS-08-15-0879-RE Link, ISI, Google Scholar
- 2019. data.table: Extension of ‘data.frame’. Google Scholar
- 2016. Spatial modelling of rice yield losses in Tanzania due to bacterial leaf blight and leaf blast in a changing climate. Clim. Change 135:569-583. https://doi.org/10.1007/s10584-015-1580-2 Crossref, ISI, Google Scholar
- 1999. A new tool for entry and analysis of virulence data for plant pathogens. Plant Pathol. 48:154-158. https://doi.org/10.1046/j.1365-3059.1999.00325.x Crossref, ISI, Google Scholar
- 1958. Root and stem rot of soybean caused by Phytophthora sojae n. sp. Phytopathology 48:201-208. ISI, Google Scholar
- 2018. AGmccoy/Phytopthora-sojae-Pathotype-analysis: Beta-release of Phytophthora sojae pathotype analysis code (December 25, 2018). Zenodo. http://doi.org/10.5281/zenodo.2526326 Google Scholar
- , and
Chilvers, M. I. 2019. openplantpathology/hagis v3.0.0. (August 26, 2019). Zenodo. http://doi.org/10.5281/zenodo.3378007 Google Scholar R Core Team . 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Google Scholar- 2011. A metamodeling framework for extending the application domain of process-based ecological models. Ecosphere 2:art90. https://doi.org/10.1890/ES11-00128.1 Crossref, ISI, Google Scholar
- 2018. Quantitative genetics of the maize leaf microbiome. Phytobiomes J. 2:208-224. https://doi.org/10.1094/PBIOMES-02-18-0008-R Link, Google Scholar
Austin G. McCoy, Zachary Noel, and Adam H. Sparks contributed equally to this work.
The author(s) declare no conflict of interest.
Funding: This work was funded by the Michigan Soybean Promotional Committee, Project GREEEN, North Central Soybean Research Program, and GRDC Project DAQ00186, Improving Grower Surveillance, Management Epidemiology Knowledge And Tools To Manage Crop Disease.


