Ecology and EpidemiologyOpen Access icon OPENOpen Access license

History, Epidemic Evolution, and Model Burn-In for a Network of Annual Invasion: Soybean Rust

    Affiliations
    Authors and Affiliations
    • M. R. Sanatkar
    • C. Scoglio
    • B. Natarajan
    • S. A. Isard
    • K. A. Garrett
    1. First, second, and third authors: Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS 66506; first and fifth authors: Department of Plant Pathology, Kansas State University, Manhattan, KS 66506; fourth author: Department of Plant Pathology & Environmental Microbiology and Department of Meteorology, Pennsylvania State University, University Park, PA 61802; and fifth author: Institute for Sustainable Food Systems and Plant Pathology Department, University of Florida, Gainesville, FL 32611-0680.

    Abstract

    Ecological history may be an important driver of epidemics and disease emergence. We evaluated the role of history and two related concepts, the evolution of epidemics and the burn-in period required for fitting a model to epidemic observations, for the U.S. soybean rust epidemic (caused by Phakopsora pachyrhizi). This disease allows evaluation of replicate epidemics because the pathogen reinvades the United States each year. We used a new maximum likelihood estimation approach for fitting the network model based on observed U.S. epidemics. We evaluated the model burn-in period by comparing model fit based on each combination of other years of observation. When the miss error rates were weighted by 0.9 and false alarm error rates by 0.1, the mean error rate did decline, for most years, as more years were used to construct models. Models based on observations in years closer in time to the season being estimated gave lower miss error rates for later epidemic years. The weighted mean error rate was lower in backcasting than in forecasting, reflecting how the epidemic had evolved. Ongoing epidemic evolution, and potential model failure, can occur because of changes in climate, host resistance and spatial patterns, or pathogen evolution.

    Network models offer an important framework for understanding epidemics and other invasions (Albert and Barabasi 2002; Balcan et al. 2009; Newman 2002; Pastor-Satorras and Vespignani 2001; Urban et al. 2009). There are many cases where an invasive pathogen is introduced to a new region, and thereafter seasonally invades the region. This is true of many rust pathogens of annual crop plants, including soybean rust, caused by Phakopsora pachyrhizi (Li et al. 2010). Similarly, endemic pathogens often experience annual patterns of saturation at smaller spatial resolution as they increase in number from limited overwintering or oversummering populations. Here we explore three interrelated concepts that are relevant to repeated invasions and patterns of saturation (Table 1). First, we consider the concept of history as it applies in repeated invasions, particularly in the sense that there may be local effects (in our case, at the county level) on epidemics that may be detected by statistical models, but are not likely to be predicted by process-based models based on limited information. Second, epidemic evolution may be important, particularly in the early stages after a pathogen has been introduced, as a source of nonstationarity such that predicting epidemic outcomes is more difficult. Third, the concept of system burn-in, often applied to manufacturing processes (Kuo and Kuo 1983; Leemis and Beneke 1990), can be applied to the fitting of epidemiological models for emerging diseases. We illustrate these concepts using the soybean rust system as a model system, along with other relevant epidemics and invasions.

    TABLE 1. History, epidemic evolution, and model burn-in in the context of soybean rust

    History may be important in a number of ecological processes. For example, the timing of the arrival of different species to a community may have important effects on processes such as wood decomposition by fungi (Fukami et al. 2010). There is a potential role of historical factors in determining patterns of species richness (Jetz et al. 2004). Applied historical ecology may include information from paleoecological approaches, remote sensing, and meteorological records (Swetnam et al. 1999). In invasions and epidemics, stochastic events may change the population structure of the invasive species (Gilligan and van den Bosch 2008; Keller and Taylor 2008). Some diseases may be described to a greater extent by deterministic models, while others are more strongly influenced by stochastic processes (Rohani et al. 2002). When infection has an immunizing effect, historical infection has an important influence on current epidemic networks (Bansal and Meyers 2012). For a disease such as soybean rust, there can be many sources of variability that are not readily included in process-based deterministic models, or readily captured in statistical models fit using only a single season of data. The importance of overwintering location and the effects of overwintering conditions are one source of variability. For example, in the absence of soybean the pathogen overwinters in the common perennial weed kudzu (Pueraria lobata), and kudzu patches may differ in disease resistance.

    Repeated epidemics and epidemic evolution occur both within years and between years. There is the potential for epidemic spatial structure to exert selection pressure on pathogen traits such as the infectious period (Boerlijst and van Ballegooijen 2010). For example, cholera epidemics evolve as a function of seasonal weather patterns (Fernandez et al. 2009) and are influenced by the structure of river networks for disease spread (Bertuzzo et al. 2008). The pathogen population, itself, may evolve (Keller and Taylor 2008), as well as the host population. Seasonality can produce multiyear patterns in epidemics (Altizer et al. 2006). Distinct phases in the course of epidemics are often identified in sexually transmitted diseases (Garnett 2002). Climate change is one potentially important source of nonstationarity in plant disease time series (Garrett et al. 2013; Garrett et al. 2011; Rosenzweig et al. 2001; Scherm 2004; Shaw and Osborne 2011). Processes related to the El Niño Southern Oscillation (ENSO) can also drive the evolution of plant disease epidemics (Coakley 1979; Garrett et al. 2009; Scherm and Yang 1995). When soybean rust is introduced to a new area, there may be a period of consolidation toward saturation as potential overwintering sites are infected, but the same overwintering sites may lose infection as a function of winter conditions.

    Burn-in refers to the period of initial high risk for manufactured products (Kuo and Kuo 1983; Leemis and Beneke 1990) and the period at the beginning of computational simulations during which starting values may be highly influential. Here, we discuss the burn-in period for fitting a model to observations, during which the model may benefit greatly from additional information as more observations are made over time. In the context of manufacturing, a bathtub-shaped distribution of the risk of failure over time is often applied, where during a short initial period the risk of failure is very high, falls rapidly for a low-risk period, and then rises sharply as the risk of failure increases again and the product becomes obsolete (Kuo and Kuo 1983; Leemis and Beneke 1990). The risk of failure may be for a particular proportion of replicate products, or for components of, for example, semiconductor products that may need to be replaced (Alani et al. 1996; Cha 2006; Cha and Finkelstein 2012). In the context of stochastic models, the amount of burn-in needed for a process to be within a specified distance of the stationary distribution is known for some cases (Jones and Hobert 2004). In our case, fitting a model to observations, we can conceptualize an epidemic network as having a number of components, the link weights, each of which is estimated with a different probability of error. In a dynamic system, nonstationarity may lead to failure of some of the model components over time, if there are substantial changes driving epidemic evolution. Major changes may include a shift in availability of overwintering sites (Mundt et al. 2013), breakdown in vaccine, pesticide, or resistance gene efficacy, or a change in pathogen temperature tolerance.

    The roles of history, epidemic evolution, and estimation burn-in are all potentially important in the application of network models for use in prioritizing sampling and management of pathogens, and of invasive species more generally. Invasive species are an increasing problem for the conservation of natural systems and the management of agricultural systems, inspiring the use of network information for designing sampling and management strategies (Chadès et al. 2011; Hulme 2009; Sutrave et al. 2012). The design of sampling strategies can draw on ecological understanding of the effects of landscape structure on invasive species risk (Minor and Gardner 2011; With 2002). Concepts related to strategies for optimal vaccination programs are often relevant to management of invasive species in general (Chen et al. 2008; Cohen et al. 2003; Gallos et al. 2007; Madar et al. 2004; Schneider et al. 2011), as are examples of identifying the most important links in computer networks, although there are a number of additional challenges for the study of invasive species that must be sampled by trained personnel in complex social-ecological systems (Chadès et al. 2011; Demon et al. 2011; Ellis et al. 2010; Epanchin-Niell et al. 2010; Forster and Gilligan 2007; Gottwald 2010; Harwood et al. 2009; Mundt et al. 2009; Xu et al. 2009). The soybean rust sentinel plot network is an example of what can be accomplished in coordinated networks for sampling, such as the U.S. National Ecological Observatory Network (Crowl et al. 2008; Keller et al. 2008; Roberts et al. 2006).

    The importance of the U.S. soybean rust epidemic has motivated substantial modeling attention. Soybean rust losses were already high in other countries prior to the US invasion (Yang et al. 1992; Yorinori et al. 2005) and over 95 other plant species can be infected by the same pathogen (Bonde et al. 2008). Modeling of soybean rust began over two decades ago (Yang et al. 1991). Soybean rust was first reported in the continental United States in 2004 (Schneider et al. 2005). An extensive sentinel plot network was put in place to sample the disease throughout the following growing season (Isard et al. 2006a; VanKirk et al. 2012). Predictions of movement of the pathogen into new areas are important to inform farmer decision-making about whether to use a fungicide or not (Dorrance et al. 2007). If a farmer thinks the disease will be present when it is not, that is a false alarm error (or false positive error) which can result in wasted fungicide applications. If a farmer thinks the disease will be absent when it actually is present, that is a miss error (or false negative error) and can result in substantial yield losses to the disease. An integrated aerobiological modeling system (IAMS) was put in place using the sentinel plot data, to predict movement of soybean rust (Isard et al. 2005; Isard et al. 2007; Isard et al. 2011). The sentinel plot network and the IAMS saved U.S. soybean growers substantial amounts of money by reducing false alarm and miss errors, where estimates of savings vary widely (Dorrance et al. 2007; Roberts et al. 2006) and include a conservative estimate of approximately $200 million per year (Giesler and Hershman 2007). The IAMS includes detailed information about packets of air movement that may contain pathogen spores, and other information about weather conditions. A network model developed for soybean rust was based on fitting a network to observed infection in county nodes, in order to identify nodes particularly important for sampling (Sutrave et al. 2012).

    Our objective in this study was to evaluate the importance of history, epidemic evolution, and burn-in for a seasonal epidemic network, the annual migration of the soybean rust pathogen in the United States. We evaluated these factors in terms of the following three hypotheses. (i) The more years of data used in estimating epidemic parameters, the lower the errors will be when the model is applied to a new season. This hypothesis is based on the idea of a burn-in period for model fitting. (ii) Models based on observations in years closer in time to the new season being estimated will have lower errors when applied to that season. This hypothesis is based on the idea of epidemic evolution such that years closer in time are more similar, and there may be a limited period of model survival. (iii) Models based on observations from years prior to the new season being estimated will have lower error when applied to that season than models based on observations from years after the season being estimated, i.e., forecasting versus backcasting. This hypothesis is based on the idea that epidemic evolution may be directional. We evaluated whether these hypotheses were supported for the soybean rust epidemic on a yearly average basis, early in the season, and later in the season. Models applied later in the year are based on more information in terms of what had happened in the current epidemic season. We also evaluated the importance of nodes in the soybean rust network based on eigenvector centrality, a measure of how well connected a node is to the most connected parts of the network, and outgoing and ingoing weighted node degree, measures of the likelihood of pathogen movement to and from neighbors of a node (Newman 2010).

    MATERIALS AND METHODS

    Data.

    The soybean rust data from the U.S. network of soybean sentinel plots is an exceptionally extensive and coordinated data set, illustrating the potential for continental synoptic data analysis. Data are available from the ipmPIPE website (www.ipmpipe.org). Data collection began after soybean rust was detected in the continental United States in early November 2004. The focus has been on eastern counties of the United States. Infection has been most common in the Southeastern United States, in part because of the presence of the common perennial weed kudzu, where the pathogen has the potential to overwinter. We evaluated the data from 2005 through 2009, representing a range of different epidemic progressions. The rust data included the presence or absence of infection at each sentinel plot for each observation date, where sampling was usually weekly or biweekly. At the sentinel plots, both an early maturing cultivar and a cultivar from the maturity group typically grown in the area were planted, generally 1 to 2 weeks earlier than in the nearby commercial fields. Evaluation of the sentinel plots was less frequent prior to flowering, and then generally occurred at least monthly at the major sentinel plots when the plants began to flower and continued until they began to senesce. The sentinel plot network did not include all relevant counties for all time intervals, so methods for estimating missing values were implemented, as described below. Soybean planting density data were accessed from the U.S. National Agricultural Statistics Service (http://www.nass.usda.gov/Data_and_Statistics/index.asp). Kudzu acreage by county was evaluated based on a data set assembled by Darryl Jewett in 2000, including reports of approximately 2 million acres of kudzu.

    Modeling soybean rust epidemics.

    We propose a new model-fitting approach based on maximum likelihood estimation (MLE) to build a statistical model for prediction of disease dispersal in dynamic networks. We model the process of disease spread using a discrete time Markov chain. The link weights represent the conditional probability of infection of the destination node given the source node is infected. The nodes of the network are U.S. counties (soybean density is reported at the county level). We estimate how likely each county is to get infected over time, based on observations of soybean rust up to the current time and/or as parameter estimates from other years’ corresponding observations. In a previous network model for soybean rust, parameters related to the probability of movement between a pair of nodes were estimated based on the observations for the network as a whole (Sutrave et al. 2012). In the current study, the MLE method was applied individually for each pair of nodes at each time step, and thus the estimate is strongly influenced by the history of that particular pair at that time of year.

    Supporting Material describes the rationale for modeling these epidemics as discrete time Markov chains, where at each time step a node in the network is either infected or not. We chose biweekly time steps for modeling the soybean rust epidemics, based on the fact that sampling at major sentinel plots generally occurred at least once a month and because there is a lag of approximately 7 days between the time that spores reach a field and the time they can cause detectable infection, depending on weather conditions (Pivonia and Yang 2006). Factors influencing the spread of soybean rust include the direction and speed of wind from each county toward the other counties, the distance between counties, and the soybean and kudzu area in each county. These factors were used to determine the link weight between county node i and county node j, βji(n), for each pair of counties (not restricted to counties sharing a border). The probability that node i “receives infection” from node j in time step n + 1, if node j is infected in time step n, was modeled as

    βji(n)=θjididjwji(n)/Lji2

    where di (km2) is the total area of soybean and kudzu in county i; dj (km2) is the total area of soybean and kudzu in county j; wji(n) (k/h) is the maximum wind speed projection from county j toward county i during the last 2 weeks (based on the daily surface data collected by the U.S. National Climatic Data Center); Lij (km) is the distance between the centroids of counties i and j; θji is the parameter that we need to estimate using observations of soybean rust, which can vary for each link from county i toward county j. di, dj, and Lij are constant. However, wji(n) changes over time and causes βji(n) to be a function of time. There is the possibility that the wind at county j does not have any projection toward county i or the projected wind speed may be negative. In these cases, we consider wji to be zero for those times. The network is dynamic because wji is a function of time, and the weighted adjacency matrix changes with time.

    Supporting Material also describes the rationale for modeling the probability that any given node becomes infected in a particular time step, as a function of the infection status of its neighbors, where the probability that node i becomes infected from at least one of its neighbors at time step n is Siind(n). Thus, πi1(n + 1), the probability that node i is infected at time n + 1, is

    πi1(n+1)=1πi0(n)Siind(n)=1πi0(n)ΠjNt(n)(1βji(n)πj1(n))

    where we assume that once a node is infected, it stays infected for the rest of that year. The likelihood function for the state of node i at time n + 1, given the θji can be written as

    where Xi(n + 1) is the infection status of node i at time n + 1.

    Estimation of θji.

    To build the network, we needed to estimate θ in order to calculate πi1 at each time step. In our problem, the distribution of the θji is unknown, so we employed non-Bayesian methods to estimate them. We used maximum likelihood estimation to estimate the θji.

    Note that, to predict soybean rust dispersal in one specific year, observations from other years were used to build the corresponding network, or, in other words, to estimate the network parameters θji. To address our hypotheses, we considered a set of different years of data for predicting the epidemic in any given single year. For example, suppose that we want to predict disease spread in 2007. In this case, we have different choices for how to build the network. One way is only exploiting 2005 observations and data to estimate the network parameters, thus ignoring data from other years that could have made our model more general. If we assume that our model is a nonanticipative (or “causal”) system, we have to limit ourselves to estimation based on the observed years before 2007. In this case, we have three different options: (i) only 2005 data, (ii) only 2006 data, and (iii) data from both 2005 and 2006. Using 2005 data and observations, we obtain one version of the estimated network parameters, and if we use 2006 data and observations, we obtain a different version of the estimated network parameters.

    We evaluated each potential combination of years that could be used to predict dispersal in a given year, including both forecasting and backcasting scenarios. When combining estimates of parameters from multiple years, we averaged estimates to obtain one set of parameters. For example, there are 15 combinations of subsets of 2005, 2006, 2008, and 2009 that can be used to build the network for 2007. To predict disease dispersal for one specific year at a given date, we also use the observations for that specific year up until that date, which increases the accuracy of the predictions. It is possible that in some cases the initial estimate of βji(n) may be greater than one, in which case the estimate was replaced with one. This can happen when a link’s wind projection at one time step is greater than the value of this variable for the same time step in the years used to build the model.

    Estimation of missing observations.

    We aimed to estimate θji at each time step for every node, so observations at each time step for every node were required as well. However, observations of soybean rust for every county were not available. Soybean rust observations exist for only a limited number of counties and sample days. Therefore, estimation of missing observations was necessary (Fig. 1). A Supplementary file includes details about our assumptions and about a novel estimator for replacing missing data.

    Fig. 1.

    Fig. 1. A, Soybean rust status in eastern U.S. counties on 20 September 2005, where green dots indicate the counties with “healthy” observations, red dots indicate the counties with “infection” observations, and black dots indicate the counties with missing observations. B, Estimated soybean rust status in eastern U.S. counties on 20 September 2005, where missing observations were estimated using α = 0.9 gain.

    Download as PowerPoint

    Prediction error.

    Two different types of prediction error are important in this context: false alarm error and miss error. A false alarm error (false positive) occurs when a node is observed to be healthy, but the prediction of the network model is that the node is infected. A miss error (false negative) occurs when the observation is “infected” and the model prediction is “healthy.” These concepts are analogous to type I and type II errors in statistical analyses, rejecting a null hypothesis when it is actually true or failing to reject it when it is actually false, respectively. The false alarm error rate at each time step was calculated as

    EfA (n)=1N jHnπj1 (n)

    where N denotes the total number of counties, Hn denotes the set of counties whose observations are healthy at time step n and πj1(n) denotes the probability of infection−for county j at time step n. The miss error rate at each time step can be written as

    EM (n)=1N jIn(1 πj1(n))

    where In denotes the set of counties whose observations are infected at time step n. The average miss and false alarm errors can be computed by averaging the miss and false alarm errors over all the time steps within a year, starting with the first time step at which there is an infected observation.

    RESULTS AND DISCUSSION

    The results indicate the important trade-off between miss error rates and false alarm error rates. The main trend for all the years is that increasing the number of years which are used to build the network results in an increasing false alarm error and a decreasing miss error. This trend is reasonable because as more years are used to construct the model the network is more general and encompasses more different possible paths of disease dispersal, thus decreasing the chance of missing a potential infection event. On the other hand, it increases the chance of false prediction of infected events for some counties which were not actually infected.

    Under our first hypothesis, based on the idea of a burn-in period for model fitting, errors would decrease as more years of observations were used in constructing the model. Averaged across years, the rate of false alarm errors went up as the model was constructed from observations from more years, while the rate of miss errors went down. This relationship was fairly consistent across years being predicted. The different results for false alarm error rates and miss error rates point out the need for determining the relative importance of both types of errors. In general, farmers making decisions about management of soybean rust may consider a miss error to be much more important than a false alarm error. If we weight the miss error by 0.9 and the false alarm error by 0.1, and evaluate the effects of increasing the number of years used in model construction late in the season (the second week of October, Figs. 2C and 3C), there is a decrease in the weighted error for all years except 2005. Changing the weighting of miss errors versus false alarm errors may change the number of years necessary for sufficient estimation.

    Fig. 2.

    Fig. 2. False alarm error frequencies for the estimation of soybean rust infection status at county nodes in the eastern United States. A to C, False alarm error frequency as a function of the number of years of observations used in estimation, for the whole year (A), the second week of July (B), and the second week of October (C). D to F, False alarm error frequency as a function of the absolute difference between the estimator and estimated years, for the whole year (D), the second week of July (E), and the second week of October (F). G to I, False alarm error frequency as a function of the difference in time between the estimator and estimated years (a negative value indicates forecasting and a positive value indicates backcasting), where a single year was used in estimation, for the whole year (G), the second week of July (H), and the second week of October (I).

    Download as PowerPoint
    Fig. 3.

    Fig. 3. Miss error frequencies for the estimation of soybean rust infection status at county nodes in the eastern United States. A to C, Miss error frequency as a function of the number of years of observations used in estimation, for the whole year (A), the second week of July (B), and the second week of October (C). D to F, Miss error frequency as a function of the time between the estimator and estimated years, where a single year was used in estimation, for the whole year (D), the second week of July (E), and the second week of October (F). G to I, Miss error frequency as a function of the difference in time between the estimator and estimated years (a negative value indicates forecasting and a positive value indicates backcasting), where a single year was used in estimation, for the whole year (G), the second week of July (H), and the second week of October (I).

    Download as PowerPoint

    Our second hypothesis was that models based on observations in years closer in time to the new season being estimated will have lower errors when applied to that season. This hypothesis is based on the idea of epidemic evolution, where epidemics in years closer in time are generally more similar because of temporal autocorrelation in host varieties planted, agronomic practices, weather, and pathogen population composition. For models based on information from a single year, the average rate of false alarm errors increased with greater distance between predictor and predicted year for 2005 and 2006 and decreased for 2008 and 2009 (Fig. 2D). Conversely, the average rate of miss errors went up for 2007, 2008, and 2009 and down for 2005 and 2006 with greater distance between predictor and predicted year (Fig. 3D). This may reflect the more geographically limited epidemics in 2005 and 2006, such that models based on the more extensive epidemics in later years predict higher rates of infection.

    Our third hypothesis was that models based on observations from years prior to the new season being estimated will have lower error when applied to that season than models based on observations from years after the season being estimated. When forecasting and backcasting were distinguished, the rate of false alarm errors went up for every year with increasing difference in time between predictor and predicted year (Fig. 2G), and the rate of miss errors went down for every year except 2009 (Fig. 3G). Efficacy of backcasting may be important for understanding historical epidemics, or epidemics that might have occurred under historical weather conditions if a pathogen had been present. When backcasting and forecasting differ in error rates, this may suggest that epidemics are changing in nonlinear trajectories that are not well-described by the models employed.

    For the U.S. soybean rust epidemic, some aspects of epidemic evolution are clear. The pathogen was first detected in the continental United States during the fall 2004 at the end of the summer soybean season. In 2005, the pathogen spread further, but its distribution was still limited, perhaps because it was only known to overwinter on the continental United States in a single Florida county (Isard et al. 2005). Future potential sources of epidemic evolution for the U.S. soybean rust epidemic include the following. (i) Currently soybean rust resistance is an unrealized goal for commercial soybean varieties. Ultimately resistance will probably be deployed in common varieties, and the pattern of deployment will change the structure of the host network. (ii) The soybean rust pathogen population appears to have limited genetic diversity. This could change, for example if storm systems deposit P. pachyrhizi spores originating from hosts in Mexico, the Caribbean islands or South America. (iii) The popularity of soybean as a crop is influenced in part by the price of other commodities such as maize, which is influenced in turn by agricultural policies. If patterns of soybean planting change substantially, this could also modify the host network. (iv) Another source of potential epidemic evolution is the link between soybean, kudzu, and the many other legume species that are all hosts of P. pachyrhizi (Bonde et al. 2008; Fabiszewski et al. 2010; Li et al. 2010). For example, the pathogen population in the United States might evolve the ability to overcome some forms of resistance in kudzu that are currently effective. Network models may prove useful for studying the movement of pathogens among linked host systems, such as the movement of Macrophomina phaseolina among tallgrass prairie species and Great Plains agricultural systems (Cox et al. 2013; Saleh et al. 2010) and the movement of Barley yellow dwarf virus among California native grass species and invasive weedy grasses (Borer et al. 2010; Malmstrom et al. 2006). Network models have already been used to understand movement of Phytophthora spp. through landscapes of many host species (Harwood et al. 2009). In some such cases, the presence of multiple host species can provide an ecosystem disservice by increasing disease risk (Cheatham et al. 2009).

    Climate change will be another important source of epidemic evolution. Although most models of the effects of weather on plant disease have been developed for small-scale forecasting, such models can be rescaled for application in climate change scenarios (Sparks et al. 2011). Similarly, models of the effects of host structure at small scales, where experiments can be performed to compare landscape effects for rusts compared with other types of pathogens, can be modified for application at larger scales (Cox et al. 2004; Skelsey et al. 2005). Ultimately, evaluation of invasion networks, the effects of host diversity, and corresponding information networks for management will be an important component of climate change scenario analyses at national scales (Garrett 2012; Garrett et al. 2009; Marshall et al. 2008; Shaw and Osborne 2011). For systems like soybean rust, nonstationarity in weather variables may make it more important to incorporate additional factors in the model, such as moisture (Marchetti et al. 1976), precipitation (Del Ponte et al. 2006), temperature (Kochman 1979), and UV radiation (Isard et al. 2006b). The geographic range of kudzu is likely to shift north under climate change (Bradley et al. 2010), increasing the risk of early infection in northern soybean fields.

    Ultimately it may be useful to develop methods to estimate the burn-in time for epidemic models. In the manufacturing context, there is the potential to estimate burn-in time based on the behavior of individual components of a system (Lee and Park 2008), and to estimate the remaining useful life (Si et al. 2011). Likewise, there are approaches for estimating the optimal levels of redundancy in burn-in systems (Chien and Kuo 1995). These may have useful analogs in the context of estimating the network structure for epidemics such as the U.S. soybean rust pathosystem. Greater understanding of the system, including information about typical node and link traits, may make it possible to estimate how many years of observation are necessary to reach an acceptable level of confidence in network model estimates. Greater understanding of epidemic evolution may make it possible to estimate how long a model will be useful into the future. Analogs for system redundancy in this context would include the number of key nodes for which epidemic status must be estimated correctly in order to estimate general epidemic characteristics, such as whether the pathogen arrives in more northern areas with intensive soybean production. For cases where relevant temporal and spatial scales are identified and long-term data sets are available, it may be useful to operationalize the concept of ecological history in the context of “ecological memory” using approaches such as stochastic antecedent modeling (Ogle et al. 2015).

    We can also consider the false alarm and miss error rates at different times of the year, keeping in mind the high degree of variability in this system. While this is an unusually extensive data set, the number of years of observation is still small, and important sources of model uncertainty include the process of estimating the many missing values, along with a bias from the field based on the much higher likelihood of missing disease in a county compared with falsely identifying it to be present when it was not. Thus, comparison of results among years has some risk of overinterpretation. The patterns of model error rates may be expected to change during the course of the year, as more information is available within the current year, and epidemics in different years may converge to become more similar at any given point in the season. Error rates were generally higher later in the season than earlier in the season (Figs. 2B and C and 3B and C). When distinguishing between forecasting and backcasting, the change in both miss error and false alarm errors was more apparent later in the season (Figs. 2H and I and 3H and I).

    While other network models, such as that of Wang et al. (2003), consider a network to be homogeneous, this model of soybean rust epidemics includes the potential for varying weights associated with the links (Schumm et al. 2007). Because these weights were estimated using observed field data, the network model has a blend of process-based and mechanistic components that can be useful for finding optimal sampling locations (Chadès et al. 2011; Sutrave et al. 2012). Using this network model, with its emphasis on historical features of the system, we identified locations likely to have particularly important roles during the historical epidemics (Fig. 4; Supplementary Tables S1 to S3). These locations may be important for sampling, particularly if other criteria are also included to identify locations at the leading edge of the epidemic. If the epidemic evolves markedly for reasons discussed above, or simply displays unusual types of behavior, reference to models less driven by history and with more basis in general submodels of epidemic processes may be important (Isard et al. 2011; Isard et al. 2007; Sutrave et al. 2012).

    Fig. 4.

    Fig. 4. Node traits of eastern U.S. counties evaluated as nodes in the soybean rust epidemic network. A to C, Eigenvector centrality for late May (A), late July (B), and late August (C). D to F, Incoming node degree for (D) late May, late July (E), and late August (F). G to I, Outgoing node degree for late May (G), late July (H), and late August (I).

    Download as PowerPoint

    The use of network models is becoming more common in plant disease epidemiology and related systems (Jeger et al. 2007; Margosian et al. 2009; Moslonka-Lefebvre et al. 2011; Shaw and Pautasso 2014). As these methods develop, they may be applied more readily to new invasives. For example, soybean rust network models may be generalized to address epidemics in new geographic regions, such as East Africa (Murithi et al. 2014). Likewise, if a new race of the wheat stem rust pathogen such as Ug99 (Singh et al. 2011) arrives in the United States, its geographic pattern will probably be similar to soybean rust in some respects. Like other wheat rusts it will likely overwinter in south Texas and northern Mexico and migrate north during the wheat growing season. The mosaic of resistance gene deployment in wheat fields will add a complicating factor for wheat rust modeling. Consideration of historical factors may also be important in interpreting ecological processes more generally, including networks such as the U.S. National Ecological Observatory Network and other large-extent coordinated ecological studies (Anderson et al. 2010; Fischer et al. 2010; Keller et al. 2008; Zacharias et al. 2011). Plant disease systems such as rust epidemics may have higher levels of stochasticity than human diseases because they are so strongly driven by weather, and they may also have different types of stochasticity. For all these systems, understanding the network structure of pathogen or other invasive spread can help in developing efficient programs for evaluating invasion progress, and potentially for designing landscapes that will reduce invasive risk.

    ACKNOWLEDGMENTS

    We appreciate helpful discussions and input from J. Golod, J. M. S. Hutchinson, D. Jardine, T. Kalaris, M. Knapp, M. Margosian, and K. With, and helpful comments from Phytopathology reviewers that improved the manuscript. We appreciate support by USDA NC RIPM Grant 2010-34103-20964, USDA APHIS Grant 11-8453-1483-CA, the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS), US NSF Grant EF-0525712 as part of the joint NSF-NIH Ecology of Infectious Disease program, US NSF Grant DEB-0516046, and the Kansas Agricultural Experiment Station (Contribution No. 15-330-J). The views expressed cannot be taken to reflect the official position of these funding agencies.

    LITERATURE CITED

    Current address of M. R. Sanatkar: Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708.