Modeling Epidemics in Seed Systems to Guide Management Strategies: The Case of Sweetpotato in Northern Uganda

Seed systems are critical for deployment of improved varieties, but also serve as major conduits for the spread of seed-borne pathogens. We evaluated the structure of an informal sweetpotato seed system for its vulnerability to the spread of epidemics, and its utility for disseminating improved varieties. During the 2014 growing season, vine sellers were surveyed weekly in the Gulu Region of Northern Uganda. Our analysis draws on tools from network theory to evaluate the potential for epidemic spread in this region. Using empirical seed transaction data and estimated spatial spread, we constructed a network of seed and pathogen movement. We modeled the introduction of a pathogen, and evaluated the influence of both epidemic starting point and quarantine treatments on epidemic progress. Quarantine of 30 out of 99 villages reduced epidemic progress by up to 66%, when compared to the control (no quarantine), over 20 time steps. The starting position in the network was critical for epidemic progress and final epidemic outcomes, and influenced the percent control conferred by quarantine treatments. Considering equal likelihood of any node being an introduction point for a new epidemic, villages of particular utility for disease monitoring were identified. Sensitivity analysis identified important parameters and priorities for future data collection. The efficacy of node degree, closeness, and eigenvector centrality was similar for selecting quarantine locations, while betweenness had more limited utility. This analysis framework can be applied to provide recommendations for a wide variety of seed systems.


Introduction
Seed systems, both formal and informal, are a critical component of global food security, but often also serve as human-mediated pathways for the regional and global dispersal of plant pathogens. Efforts to implement seed systems that work better for smallholder farmers in lowincome countries have often been unsuccessful (Gibson and Kreuze 2015;Thomas-Sharma et al. 2016). Improving seed security -defined as timely access to quality planting material by all, at a fair price (Almekinders et al. 1994;Gibson et al. 2011;McGuire and Sperling 2013;Sperling 2008) -is vital for improved livelihoods, particularly for smallholder farmers. The provision of 'clean' or 'pathogen-free' seed is a major challenge to any seed system. In most low-income countries, seed systems are local and largely of unknown quality, with a majority of farmers keeping seed from previous seasons, or obtaining seed from neighbors, local traders, or local markets, with some instances of long distance trade (Gildemacher et al. 2009;Pusadee et al. 2009). Having robust analytic models of seed systems supports policy development and risk assessments against possible system disruption caused by system shocks such as novel pathogen or pest introduction, climate change, or political unrest.
Informal seed systems, without healthy seed certification protocols, may be at higher risk for epidemic introduction and spread. Newly introduced viruses can be particularly severe, as methods for detection may be limited or unavailable, and host resistance may be unattainable for several years. In this study, we consider an informal sweetpotato seed system, where 'seed' is not true botanical seed, but vine cuttings. In these types of vegetatively propagated seed systems, viruses and other seed-transmitted diseases introduce important risks to yield and quality degeneration over successive cycles of propagation, and methods of control are limited (Thomas-Sharma et al. 2016). It is important, therefore, that the risk of novel pathogen introduction and All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; potential epidemic dynamics be understood for the swift recommendation of intervention (such as sampling, quarantine, variety deployment, and education).
The problem of seed-transmitted viral introduction was illustrated in 2014 when Sri Lankan cassava mosaic virus (causing cassava mosaic disease) was first detected in Southeast Asia, presumably being introduced through infected seed material. Efforts are still ongoing to mitigate spread and deploy resistance before this yield-robbing disease takes hold (Graziosi et al. 2016; Wang et al. 2016). Another dramatic example occurred in 2011 when maize lethal necrosis (MLN) was first reported in Kenya (Wangai et al. 2012) and soon was detected in several other Sub-Saharan African countries. MLN symptom expression results from coinfection with Maize chlorotic mottle virus (MCMV) and a potyvirus (Mahuku et al. 2015). Since its introduction, MLN has been detected in several East African countries including Ethiopia, Uganda, South Sudan, Tanzania, DRC, and Rwanda, and up to 22% yearly yield loss has been reported (De Groote et al. 2016;Hilker et al. 2017;Mahuku et al. 2015).
Network analysis is a powerful analytic tool, used across many disciplines, with recent advances in its utility for analyzing epidemic spread in human, animal, and plant systems (Shaw and Pautasso 2014;Silk et al. 2017). Seed systems are amenable to network analysis because they are inherently networks with a suite of actors (network nodes) that move both genetic material and information through space and time (dynamic or static network links) (Pautasso 2015). Until recent years, plant disease epidemiologists have given relatively little attention to the study of seed system (and plant trade) networks, although the movement of planting material plays a fundamental role in the spread of plant disease and the persistence of epidemics (Buddenhagen et al. 2017;Garrett et al. 2017;McQuaid et al. 2017;Nelson and Bone 2015;Pautasso 2015;Pautasso and Jeger 2008). Increasing availability of computational tools, coupled All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; with advances in network analysis in the medical and social sciences, makes the implementation of network analysis for plant disease epidemiology more obtainable.
Network analysis can be used to evaluate nodes important for surveillance and mitigation of the movement of pathogens or other contaminants through seed systems and landscapes (Sutrave et al. 2012). To accomplish this, network statistics, such as measures of node betweeness, closeness, and degree centrality can be calculated to define key nodes and actors in a system, and forecast the risk of pathogen introduction, pathogen spread, or technology diffusion in a cropping system (Garrett 2012;Harwood et al. 2009;Moslonka-Lefebvre et al. 2011;Pautasso 2015;Pautasso and Jeger 2008;Sanatkar et al. 2015). For example, a study of wheat grain movement in the United States and Australia was implemented to identify key locations that could be strategically targeted for sampling and management of mycotoxins (Hernandez Nopsa et al. 2015). For the U.S. soybean rust epidemic, the utility of different selection methods was compared for targeting geographic nodes for sampling to forecast epidemic movement (Sanatkar et al. 2015). Although there are a number of statistics available that are likely to be associated with the importance of a node in an epidemic, it is an open question as to which statistics are most important for prioritizing nodes for monitoring and management in real-world networks (Holme 2017).
Previous studies of seed systems have focused on understanding the effects of social ties and how well seed system networks may conserve variety diversity in the landscape (Pautasso 2015;Pautasso et al. 2013). Abay et al. (2011) applied network analysis in a study of informal barley seed flows in the Tigray region of Ethiopia, with the goal of informing breeding or technology deployment strategies. Network properties, such as betweenness and degree centrality, were used to characterize key nodes and their role in connecting the seed system. All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; These metrics measure the importance of a node in terms of the number of connections it has, and the number of paths across the network of which it is a part, respectively. Pautasso et al. (2015) further analyzed this empirical seed network data, finding that the degree distribution of the network, and particularly the out-degree of the starting node, influenced the percentage of the network that could be reached by a new variety.
Our study draws on concepts developed for studying epidemic spread in hypothetical and empirical large-and small-scale plant trade networks (Buddenhagen et al. 2017;Moslonka-Lefebvre et al. 2011;Nelson and Bone 2015;Pautasso 2015;Pautasso and Jeger 2008;Pautasso et al. 2010). We consider theory about the influence of node in-and out-degree on variety dissemination (and pathogen transmission) in small-scale, real-world seed networks (Pautasso 2015). We also expand on concepts developed to assess quarantine influence on disease spread in hypothetical trade networks (Nelson and Bone 2015) in a new set of simulation experiments, where quarantine nodes were not selected after initial pathogen detection, but based on their network centrality measures. Here we define quarantine as the restriction of the exchange of infected seed material, likely though phytosanitary regulation. We further explore the question of quarantine efficacy based on the epidemic starting point and the chosen centrality predictor.
We model epidemic spread and mitigation in a landscape of farming villages, where a portion of the seed transaction structure is based on geo-spatial proximity of villages. Our model of the contact structure (links) between villages is based on the much higher tendency for farmers to exchange planting material with neighboring villages than with those that are distant (Perales et al. 2005;Pusadee et al. 2009). We also consider the sensitivity of parameter choices in our model and their impacts on epidemic outcomes. Improvement of these analytic tools is All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; particularly important for understanding the epidemiology of pathogen spread in vegetatively propagated crops, where social contact structures via the exchange of planting material is a major driver of epidemics.
Seed systems are a suite of actors, or nodes, including farmers, multipliers, traders, NGOs, seed companies, breeding organizations, and communities. The connections between these nodes representing formal and informal interactions, are complex and require analyses that address this complexity. Network analysis allows us to simulate multiple scenarios in known systems, including the impact of epidemic spread or management deployment. In this study we propose a general framework for analyzing such networks that can be translated to a broad range of seed systems. In this paper, we, i) analyze, as a case study, key network properties within a seed system important to regional food security; ii) evaluate variety dissemination within the network, comparing the distribution of higher-nutrient introduced varieties and landraces; iii) model the progress of a potential seed-borne pathogen introduced into the network, and compare the use of different network statistics as selection criteria for quarantine nodes; and iv) perform sensitivity / uncertainty analysis to determine the influence on epidemic outcomes of the parameters used to construct the village-to-village transaction networks.

Study System: Sweetpotato in Northern Uganda
This study examines sweetpotato vine transactions in Northern Uganda. Sweetpotato is a major staple food crop in many African countries, and Uganda is the second largest producer in Africa, fourth globally (FAOSTAT 2013). Sweetpotato is generally grown by women in Uganda in small plots of land, close to the household, and is important for household food security All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; (Behrman 2011;Johnson and Gurr 2016). In the last decade, sweetpotato has increased in importance due to the introduction of a β -carotene biofortified crop, Orange-fleshed Sweetpotato In Northern Uganda, sweetpotato seed material is sold in small bundles of vine cuttings.
Distribution of these vine cuttings is largely informal, consisting of smallholder farmers who All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; have access to fields with adequate moisture to produce roots and vines through the extended dry season, which typically lasts from December to April (Gibson 2013). These off-season multipliers generally produce local landraces, which tend to be well-adapted white-fleshed cultivars. (Gibson 2013). Vine cuttings are not easily stored, and because of the single, extended dry season in Northern Uganda, vines need to be obtained by farmers at the beginning of each season (Gibson et al. 2011). There are also several formal institutions involved in sweetpotato breeding and distribution in Uganda, including the National Sweetpotato Program (NSP), the International Potato Center (CIP), private sector enterprises, and NGOs (Gibson 2013).

Survey Methods
A survey of vine multipliers and sellers was conducted in 2013-2014 in the Gulu Region of Northern Uganda, fully described by Rachkara et al. (2017). A more complete cohort of multipliers and sellers was surveyed in 2014, the focus of our analysis. Each seller was visited weekly from the start of the growing season (April) and surveyed twice per week to record all transactions (purchases and sales) that occurred in the period since they were last visited until the end of the season (August). Volume of transaction (number of bundles), price, variety, origin of buyer, and buyer type (farmer or seller) were recorded. In this study, a small bundle refers to 50 vines cut to 40 cm in length. Large bundles are equal to 20 small bundles. Because of the high volume of transactions, names of individual buyers were not collected and therefore sales transactions were summarized by the buyer's village.

Seed Network Analysis
Nodes in this analysis include sellers and villages, with one set of directed links representing vine sales from an individual seller to an individual village (a bipartite graph).
Villages in this region of Uganda typically are composed of 40-60 households. Although several All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; transactions were recorded on a weekly basis, transactions were aggregated for this analysis so that seller-to-village links represent the existence of at least one transaction over the course of the season. Seller-village links are based entirely on the data from Rachkara et al. (2017).

Incorporating village-to-village spread
To incorporate epidemic risk in the seed system outside of spread through sellers, we evaluated a second set of geo-spatially derived links representing village-to-village spread through informal exchange of planting materials or vector movement, composing a second network. The existence of village-village links was modeled using a power law function, and combined with transaction links to form an expanded adjacency matrix (Fig. 1). The power law function captures the tendency for seed exchange to be more likely between villages that are geospatially proximate. That is, farmers have a higher chance of exchanging seed with farmers in neighboring villages than with farmers in villages that are far away. A link represents a seed movement event (transaction) or movement of vectors that may be viruliferous. The power law equation used here was y = adβ , where d is the Euclidean distance between a pair of villages, y is proportional to the probability of movement between the villages, and a is a scaling factor. As default values, we used a = 1 and β = 1.5. The seed transaction threshold (t) was set to 0.01, such that when y > 0.01 for two villages, a link was created between them. Village-to-village links were established once and the same underlying adjacency matrix was used across simulation experiments for a given parameter combination. In individual realizations, the underlying adjacency matrix was modified as links were maintained with a fixed probability, as described below. A sensitivity / uncertainty analysis was performed to determine the influence of the parameters β and t on the analysis. It is important to note that only villages of farmers who All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; purchased vines from sellers surveyed in this study were included in this analysis, and there may be villages in this region that contribute to risk, but were not included.
Network properties, such as number of nodes and network density, were calculated for the 2014 season. Node measures such as degree, closeness, eigenvector, and betweenness centrality were evaluated for both villages and sellers (Table 1). All analyses were conducted in the R programing environment (R Core Team 2017) using custom code along with software packages including igraph (Csárdi and Nepusz 2006).

Modeling Epidemic Spread
This study draws on tools from graph theory to better understand the potential for epidemic spread within a real-world seed network. This analysis used both observed 2014 seed transaction data and estimated spatial transactions to simulate the invasion of a potential seedborne pathogen. Utilizing the above described network, we performed simulation experiments to address the following questions. (1) What are the optimum risk-based surveillance locations for pathogen monitoring, if there is equal likelihood of pathogen introduction at each node? (2) What effect does seller at which disease is introduced have on disease progress and final disease outcome? (3) How much can disease spread be limited by implementing quarantine treatments, where quarantined nodes cannot become infected or spread disease? (4) How do network statistics compare for their utility for selecting quarantine locations? In each experiment, simulations were conducted over 20 timesteps and repeated 500 times. The probability of pathogen transmission (P t ) occurring in each of the generated links was set to 0.10. The probability of persistence (P p ; entries on the diagonal of the adjacency matrix) was set to 1.
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Experiment 1: The value of villages as risk-based surveillance locations
We evaluated the importance of each village as a risk-based pathogen surveillance location. In this scenario analysis, all nodes (sellers and villages) were assigned an equal likelihood of being the starting point for epidemic introduction. For each possible combination of an epidemic starting node and a sampling node, we determined the number of nodes that could become infected by the time the pathogen is detected at the sampling node, as in Buddenhagen et al. (2017). In this analysis we compare only villages, not sellers, for their relative value as locations for epidemic monitoring. In 500 realizations, the mean and variance of the number of nodes infected by the time the pathogen was detected in each village were evaluated, and each village was assigned a "surveillance score". Summarizing over all the potential starting nodes allows for the comparison of the importance of each village as a potential location for monitoring, with each potential starting location equally likely. Simulations were implemented using custom R code (R core team 2017).

Experiment 2: Modeling epidemic progress with variable starting sellers
The second simulation experiment evaluated the potential spread through the seed network, where the epidemic starts with a single seller that either sells only landraces (86% of sellers, Fig. 3) or sells only orange fleshed sweet potato varieties (14% of sellers, Fig. 3). The two sellers with the highest out-degree in each category (OFSP and landrace seller) were chosen for comparison in this analysis, representing the greatest risk for each type. The out-degree of the starting node is generally strongly correlated with final epidemic outcome (Pautasso 2015;Pautasso et al. 2010). At the start of each realization (Time 1), the "starting seller" had an infection status of 1 and all others were set at 0. The starting seller maintained this infection status throughout the realizations, and all other sellers remained uninfected. In each timestep, All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; 10% of the links from the infected seller node resulted in an "infection event" where the status of the linked village node became 1. Villages that became infected after one time step maintained infected status in the subsequent time step (Time t+1) and could thus infect villages to which they had links with the same probability (P t = 0.01) in the subsequent time step (Time t+2; probability of infection persistence, P p = 1; see Supplemental Figure 1). Infection was evaluated across 20 timesteps in 500 realizations, to evaluate the frequency distribution of outcomes.
Simulations were carried out using custom R code (R core team 2017). To evaluate the differences in disease progress between starting nodes, the area under the disease progress curve (AUDPC) was calculated by summing the trapezoids between timesteps under the curve.

Experiment 3: The influence of quarantine on infection dynamics
We evaluated the influence of quarantine on disease spread over time. Quarantine was defined here as the removal of the potential for a node to transmit or acquire infection in each time step. The influence of 10, 15, 20, 25 and 30 quarantined nodes on infection spread over time was compared with a control in which there was no quarantine treatment. Quarantined nodes (villages only) were selected initially based on node degree, with villages with the highest node degree selected first. For consistency, the same two sellers (S_25 and S_15) that were chosen to start the epidemic in the previous experiment were used in this analysis. Simulations were implemented as described above, over 20 timesteps in 500 realizations, with the AUDPC calculated for each quarantine treatment. To further explore the influence of starting seller on quarantine treatment, we calculated quarantine efficacy, or the proportional reduction in disease as compared to the non-quarantined treatment, as (AUDPC no quarantine -AUDPC quarantine ) / (AUDPC no quarantine ) for all 27 sellers.
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; We also compared the utility of key network node statistics (Table 1) for the selection of nodes as quarantine candidates. We performed the above quarantine analysis (for quarantine of 30 nodes, two starting sellers) with villages selected for quarantine based on their node betweenness centrality, closeness centrality, and eigenvector centrality, in comparison to a scenario where 30 villages were drawn at random (Table 1). The value of each villages as a riskbased surveillance location, calculated in Experiment 1, was also compared for its utility to select quarantine candidates. In the evaluation of each method for ranking the likely value of nodes for quarantine, nodes were ranked in importance based on that method and the top ranked 15, 20, 25 or 30 nodes were selected for quarantine.

Sensitivity analysis and uncertainty quantification
To evaluate the influence of parameter choices for parameters describing system-specific and often unknown features such as the dispersal kernel, we evaluated the effects of varying parameter values on the epidemic outcome (summation of infection status of all nodes at the end of 20 timesteps) for each of the parameter combinations described in Table 1. Parameters β and t influence the number of links formed between villages, ultimately influencing the likelihood of transmission of a pathogen at each time step. The probability of transmission (P t ) influences the percentage of links that will transmit infection in each timestep, ultimately influencing epidemic progress. These analyses were done for both infection starting at S_15 (the highest degree landrace seller) and S_25 (the highest degree OFSP seller).
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Network Properties
In 2014, 27 sellers were tracked, resulting in a total of 878 individual vine sales to buyers from 99 distinct villages. The seller-to-village portion of the adjacency matrix was estimated based on aggregated transactions from sellers to villages (Figure 2a). This network has a total of 126 nodes and 205 links (link density = 0.013). After the addition of village-to-village links (link density = 0.047, Figure 2b), the number of links increased to 743, roughly representing the number of potential vine transactions in each time step in simulation experiments. Node degree, or the number of incoming (in-degree) and outgoing (out-degree) links was calculated for both seller nodes (mean = 7.6, min =1, max =42) and village nodes (mean = 12.9, min = 1, max = 60; Supplemental Fig. 2). The degree distribution of this network indicates scale-free properties (Barabasi and Albert 1999), with a high number of nodes having few link and fewer nodes having many links. The 30 villages with the highest node degree (both in-and out-degree) were selected as candidates for quarantine in subsequent analyses. Betweenness, closeness, and eigenvector centrality were also measured for all villages in this analysis (Table 1)

Variety Dissemination
A total of 15 cultivars were sold during the 2014 season (Fig. 3). These cultivars were a mix of landraces (all white-fleshed) and cultivars introduced by the national breeding program (Fig. 3). Six of these cultivars were OFSP cultivars, and were disseminated by only two sellers, in many individual transactions (Fig. 3). In comparison, the most common white-fleshed landrace, Ladwe Aryo, was sold by 25 distinct sellers in hundreds of transactions throughout the All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; season. When the network is examined separately by variety, disaggregation becomes apparent (Fig. 4). Although both Ladwe Aryo and the OFSP cultivar, Ejumula, reached 51 villages each, only 8 of these villages were overlapping (Fig 4). Evaluating networks representing the distribution of the top eight varieties to villages in the network (Fig. 4) indicates that only a small number of sellers and villages were exchanging orange-fleshed varieties. It appears, from this survey, that most individuals from a single village only buy a single variety, even when they have access to multiple sellers.

Experiment 1: The value of villages as risk-based surveillance locations
For the scenario where each node is an equally likely starting point for an epidemic, we evaluated the value of each village as a risk-based surveillance location. A village was considered a more effective location if only a small proportion of other nodes were colonized before the pathogen could be detected at that village. The village with the highest surveillance value was V_58, with the lowest mean number of villages affected by the time the pathogen would be detected there (Fig. 5). This village was close to the town seller, where many of the sellers sold their vines, likely accounting for high access to vines and a central position in the network.

Experiment 2: Modeling epidemic progress with variable starting sellers
The two sellers with the greatest out-degree for their product, S_25 (OFSP seller) and S_15 (landrace seller), were compared as starting nodes for an introduced epidemic simulated over 20 timesteps (Fig. 6). These sellers had an out-degree of 42 and 19, respectively. Infection starting with the landrace seller (S_15) approached epidemic saturation (where the maximum number of villages that can be reached, has been reached (50 of the 99 villages)). For the OFSP seller, saturation was at approximately 70 nodes out of 99. Overall, the mean number of nodes All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; infected by time 20 was lower when infection started with the landrace seller (50.2 sd = 2.1) than for the OFSP seller (69.4 sd = 2.5). Disease progress, therefore, differed between starting nodes as well, with AUDPC being higher for infection starting with the OFSP seller (977, sd = 54.6) compared to starting with the landrace seller (697, sd = 57.2).

Experiment 3: The influence of quarantine on infection dynamics
We evaluated the influence of quarantining villages so they cannot become infected or transmit infections to other villages, representing the common practical scenario of phytosanitary quarantine by regulatory agencies after the detection of pathogens in new regions. Nodes were selected based on node degree rank (with comparison to other methods of selecting nodes in the next step). Node degree for the 30 nodes selected as quarantine candidates ranged from 60-17 (with the highest selected first). Epidemic simulations, as previously described, were repeated for both sellers (S_25 and S_15), for each quarantine scenario based on degree (Fig. 7). For infection beginning with a landrace seller (Fig. 7a) the mean number of nodes infected by the end of the simulation (timestep 20) decreased by 6%, 16%, 28%, 46%, 66%, 71%, 78%, 79% and 81% for each quarantine treatment (quarantine of 10,15,20,25,30,35,40,45, and 50 nodes, respectively) when compared to the no-quarantine control. The AUDPC decreased similarly with increasing number of quarantine nodes (Fig 7b).
For the OFSP seller, the mean number of nodes infected at the end of time 20 decreased by 5%, 17%, 28%, 44%, 57%, 70%, 79%, 88%, and 94% for quarantine of 10,15,20,25,30,35,40,45, and 50 nodes, respectively (Fig. 7c). Across treatments, final epidemic outcome was higher when infection starting with the OFSP seller (S_25), and although a similar effect of quarantine was observed, the percent control imposed by the treatment appeared to variable between sellers (S_17). To further explore whether the node degree of the starting seller was a All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
We compared the selection of quarantine candidates based on node degree centrality (used in analysis described above) to closeness, betweenness, eigenvector centrality, the estimated risk-based surveillance score, and villages drawn at random, in terms of how they affected AUDPC outcomes across realizations (Figure 8). It was not surprising that, after quarantining at least 15 nodes , the "smart" selection criteria (centrality measures and monitoring efficiency) outperformed the "naive" selection criteria (randomly selecting villages to quarantine) (Supplemental Figs. 5 and 6). It was somewhat surprising, however, that this trend was not clear for the 10 node quarantine treatment, indicating that the invasion may be able to "overcome" quarantine of less than 10 nodes, even if the nodes are those with the highest centrality scores.
In this system, it appears that closeness, eigenvector, and degree centrality are equally good predictors of quarantine hubs and confer a similar reduction to AUDPC (Figure 8). This is consistent for each of the starting nodes. Interestingly, independent of starting node, betweenness centrality consistently resulted in a higher mean AUDPC when 20 and 25 nodes were quarantined.

Sensitivity analysis and uncertainty quantification
Sensitivity analysis was performed to examine the influence of key parameters, describing the network and epidemic spread, on final epidemic outcomes. Both the exponent of the power law β and the threshold for link formation t influence the epidemic progress curve (Fig. 9). As β and t decrease, we see an increase in the number of nodes infected by the end of All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; the epidemic, time 20 (Fig. 9). At time 20, with β = 0.5 and t = 0.001 all villages are infected, in each of the 500 realizations. Alternatively, when β = 2.5 and t = 0.02, little or no epidemic progress is made. The same trend is apparent for infection starting with each of the two sellers (Supplemental Fig. S7). These dramatic differences are likely due to the number of links formed between villages with each parameter set. With the network highly saturated with links, as is the case with β = 0.5 and t = 0.001, the epidemic can move across the system rapidly. A similar trend was also seen when the probability of transmission (P t ) varied from 0.1 to 0.05 and 0.15 (Supplemental Fig. S8).

Discussion
In this study, we layered empirical survey data with geo-spatial data to construct a network model of the sweetpotato seed system in Northern Uganda. Using this model, we simulated epidemics under several scenarios and identified key villages (nodes) in the region for both monitoring and for the deployment of management tactics, like quarantine. Epidemic starting point influenced not only epidemic outcome, but also the efficacy of quarantine interventions for limiting disease spread. In this system we compared several centrality measures for their utility to limit epidemic spread when placed under quarantine. We found that node degree performed as well as or better than centrality measures that take into account the broader network topology (eigenvector centrality, betweenness centrality, and closeness centrality), a finding with practical management implications.
The analysis of network topologies presented here shows how sales of OFSP varieties and landrace varieties differed. When vine distribution was disaggregated by variety, it is clear that most cultivars are not well disseminated by seller-to-village links (Fig 4). We found that a All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; single white-fleshed landrace variety, Ladwe Aryo, dominated landrace sales in this season, while a single OFSP vine, Ejumula, dominated OFSP sales (Figs. 3 and 4). Interestingly, there is little overlap between the villages where farmers bought these varieties. Based on the observed data, we cannot determine whether preference or availability of planting material of OFSP varieties drove sales, or a combination of the two. This distinction is important, however, because much effort has been made to promote improved OFSP varieties in recent decades (Low there remains an adequate local supply of vines provided by these informal sellers. Seed exchange is often highly related to kinship ties, language, and social organization patterns (Abizaid et al. 2016;Labeyrie et al. 2016;Perales et al. 2005;Westengen et al. 2014) and future research to better model the influence of social structure on variety diffusion in this seed system network would be useful.
Our findings are consistent with previous studies indicating that the epidemic starting point influences both epidemic progress and epidemic outcome (Pautasso 2015;Pautasso et al. 2010). This was illustrated in experiments 2 and 3 where we found that when the epidemic started with the highest out-degree OFSP seller, when compared to the highest out-degree landrace seller, more villages in the landscape became infected by the conclusion of the epidemic. It appears that one main driver of this phenomenon is the out-degree of the "starting seller" (or the seller with infected seed material). We also found that the epidemic starting seller All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; influenced the efficacy of quarantine interventions, with a lower percent control conferred by the same quarantine treatment when infection started with the OFSP seller, compared to a landrace seller, up to a certain number of nodes quarantined (30) after which the trend was reversed. This indicates that not only does the introduction point of a pathogen in this network impact the risk of spread through the network, but it also influences the efficacy of management strategies.
Similar methods have also been used to simulate variety diffusion through seed systems (Pautasso 2015). Future studies should assess the rate of adoption over time in the landscape, to determine the implications of the models of spread that we presented for each type of variety ( Fig 5).
In experiment 1 we characterized villages in Northern Uganda for their utility as pathogen surveillance locations, based on a set of simulations where all nodes had an equal chance of being the point of entry into the network. Villages were identified as potential riskbased surveillance targets based on the observed frequency, in simulations, that the pathogen would be detected in these locations before substantial parts of the rest of the network become colonized. This method can be used to identify "high utility" sentinel locations to focus sampling efforts for new and emerging pathogens, a critical effort to avoid large-scale epidemics. This method of prescribing surveillance locations can complement new field-based diagnostic technologies, such as loop-mediated amplification (LAMP) assays (Yasuhara-Bell et al. 2016) or smartphone-assisted crop disease image detection (Mohanty et al. 2016), which are becoming increasingly available to practitioners and have the potential for rapid on-site detection of viruses and other pathogens. The method used here, applied to a region in northern Uganda, can support pathogen surveillance efforts on a national or greater scale. All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; Centrality measures are important indicators of risk within epidemic networks (Banks et al. 2015;Holme 2017;Kiss et al. 2006). There is a trade-off of high centrality within this seed network. That is, it is favorable to be a village with high degree (many links) because of the increased availability of a diversity of cultivars (like OFSP varieties), from more sellers.
Availability is often defined as the physical presence of seed at planting time within a reasonable proximity of farmers (McGuire and Sperling 2011), and is a component of seed security for a village. On the other hand, high connectivity makes a village more susceptible to pathogen invasion and increases its likelihood of serving as an epidemic "super-spreader", as illustrated by our quarantine experiments (Fig 7) where, after high-degree villages were quarantined, the epidemic was successfully reduced. Node degree was a good measure for the informed selection of quarantine locations (Fig. 8). The utility of node degree for prioritization is consistent with literature on pathogen spread in animal trade networks, where centrality measures (such as degree) have been significantly associated with on-farm disease levels and disease progress (Kiss et al. 2006;Lee et al. 2017;Salines et al. 2018).
We compared the use of node degree, with node betweenness, closeness, and eigenvector centrality, and found each performing similarly, with betweenness conferring the least control.
Practically, this is an important finding for the mitigation of invasive pathogens in plant exchange networks, as node degree is one of the simplest centrality measures to collect (Christley et al. 2005) and confirms findings from human epidemiology literature (Lloyd-Smith et al. 2005). In times of epidemic emergence, these high degree villages should be the first to be targeted with control strategies, such as quarantine in the form of phytosanitary regulation. Similar methods to those described here may be utilized to target villages for development projects that aim to disseminate varieties to key hubs and maximize their distribution. However, All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; betweenness may prove to be more important for identifying key nodes in some other types of seed systems, in which there are many nearly separate modules with a small number of links between them. In such seed systems, the nodes with high betweenness centrality that bridge these modules might be particularly important.
Although quarantine was effective at slowing epidemic progress when sufficient nodes were included, here we found that quarantine was not sufficient (at < 50 nodes quarantined) for halting epidemic spread under any scenario. This property of rapid disease spread is consistent with other scale-free networks, where high-degree "super-spreaders" can rapidly transmit disease to other nodes in the network in a small number of steps (Banks et al. 2015;Jeger et al. 2007;Lloyd-Smith et al. 2005). about disease progress and management strategies. Integrated seed health may be easier and more cost effective to deploy than strict phytosanitary restrictions or introducing complete 'quarantine' of villages in these systems, where ethnobiological associations are major drivers of exchange between villages, and access to certified clean seed may be minimal to non-existent.
It is important to note that this study was based on data limited to sellers that participated in weekly monitoring in the Gulu region of Northern Uganda. Although this dataset was expansive, there may have been other sellers or sources of vines that were not captured. In addition, there are other villages with sweetpotato fields in the region of study that were not included in this analysis because they were not associated with a buyer in the dataset. These fields could also be a source of disease. Furthermore, links in this study were unweighted and All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; fixed through time. It may be valuable to examine the influence of a weighted, dynamic contact structure on epidemic progress in such seed networks. To our knowledge, this has not been examined in the literature. The model used here is a susceptible-infected (SI) model, where once nodes become infected, they remain infected (and infectious) through the duration of the time course. In human epidemiology, diseases that follow an SI model are known to be difficult to control. Because there may be options for epidemic recovery in this sweetpotato seed system (through reversion, roguing, or positive selection), future studies may include a recovery term in the model, which would result in more effective quarantine measures. Another outstanding question is the influence of multiple starting points on epidemic progress and outcome.
Understanding the dynamics of epidemics in seed systems is critical for effective pathogen monitoring, risk assessment, and epidemic management (Buddenhagen et al. 2017;Harwood et al. 2009;Shaw and Pautasso 2014). Although some plant disease literature explores this topic, research on real world seed network epidemics remains limited. Results from these studies can be strategically used to prioritize surveillance efforts and to disseminate new varieties in informal seed systems. Future surveys that include questions about social ties and the movement of information among farmers would support better models of variety adoption and distribution in this system. Next research steps will include more finely parameterizing transmission patterns, including the impact of variety resistance and vector biology, and modeling system adaptation to sustained exogenous shocks and stressors. There is the potential to include data about known yield degeneration rates and known environmental conditions to predict regional yield loss in the case of pathogen introduction. Understanding these system components supports better strategies for seed system development.
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; Node degree of epidemic starting point (number of links) has been shown to influence epidemic outcomes. Those with high degree may be "super spreaders" once infected

Betweenness centrality
The number of shortest paths through the network of which a node is a part A measure of how much a node serves as a "bridge" to new nodes. Removal of nodes with high betweenness may contain an epidemic within a region

Eigenvector centrality
A weighted sum reflecting both direct links to a node (degree) and the node degree of neighbors If a node does not itself have a high node degree, but is connected to nodes with high degree, it may be at increased risk of infection and spreading infection

Closeness Centrality
The inverse of the average length of the shortest path to/from all the other nodes in the network Efficiency of the node to spread disease to any other node in the network All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Figure 1
A schematic illustrating how two types of data were combined: A) seller-to-village links reported in the 2014 vine seller survey, and B) village-to-village links estimated based on the distance between villages using a power law function to estimate the probability of movement. The schematic represents a hypothetical network of three sellers and five villages, each represented as a node in the network. A link between nodes is represented as a 1 in the matrix, and absence of a link is indicated by a 0. Matrices were combined in C) a "complete network" with both seller-tovillage and village-to-village links. Note that in this study all seller-to-seller links and village-toseller links were set to zero, with no transactions taking place in this direction. A "complete network" based on the Ugandan sweetpotato data was used in the simulation experiments presented in this study. 1 2

Figure 2
The network structure of sweetpotato vine transactions reported in the 2014 growing season in Northern Uganda with both sellers (darker nodes) and villages (lighter nodes). Links represent the occurrence of at least one transaction in the 2014 growing season. Note that the network layout is generated by the Fruchterman-Reingold force directed algorithm, which locates nodes with links closer together and those without links further apart, and not the geographic coordinates of villages. Plot A) represents empirically sampled seller-to-village links and, B) represents village-to-village links estimated as a function of inter-village distance.
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Figure 5
Network resulting from a simulation (500 realizations) where each node had equal likelihood of being the introduction point of an invasive pathogen. Node color indicates the total number of nodes that would be reached in the network before detection at that node. If few nodes were reached (lighter shading) the node was considered an effective monitoring location, while if many nodes were reached (darker shading) the node was considered ineffective for monitoring.
Nodes labeled with an "S" and depicted with gray boxes are sellers, and circles ("V") are villages. Only villages were considered in this analysis for their efficacy as sampling locations. All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Figure 7
Disease progress over 20 time steps under specified quarantine regimes (None = no quarantine,and 10,15,20,25,30,35,40,45,or 50 nodes quarantined). A) and C) Disease progress, for each quarantine status, with infection starting with the seller with the highest out-degree of landrace sales (S_15) in A, or the seller with highest out-degree of orange-fleshed sweetpotato sales (S_25) in C. Lines represent smoothed conditional means. B) and D) Area under the disease progress curve boxplot and distribution (violin plot) for each of the 500 realizations in A and C, respectively.

Figure 8
Mean AUDPC values for six quarantine scenarios (0 = no quarantine, and 10, 15, 20, 25, or 30 nodes quarantined, respectively) for infection starting with A) the seller with the highest outdegree of landrace sales (S_15), and B) the seller with the highest out-degree of orange-fleshed sweetpotato sales (S_25). Four different centrality measures were compared for their utility in selecting quarantine locations (betweenness, closeness, degree, and eigenvector). These were also compared to a scenario where nodes were selected at random for quarantine, and nodes that were selected as effective monitoring locations ("Sampling Location") in the previously described simulation experiment (Fig. 5).
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Figure 9
Uncertainty quantification for the number of nodes infected at time step 20 across 500 realizations for a range of values for parameters β (0.5-2.5) and t (.001-.02). β is the exponent of the power law equation, and t is a threshold applied to the power-law transformed distance village distance matrix. β and t were used to evaluate seed transaction links between villages, and therefore influenced disease spread. The simulated epidemic started at node 15 (S_15).
Similar analyses for other starting nodes and a range of transmission rates (P t ) were also performed (Figs. S5 and S6).
All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.

Sellers Villages
Out degree

Sellers Villages
A B All rights reserved. No reuse allowed without permission.
(which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.  (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint . http://dx.doi.org/10.1101/107359 doi: bioRxiv preprint first posted online Feb. 10, 2017; (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.