
Mapping the Environmental Risk of Beech Leaf Disease in the Northeastern United States
- Yongquan Zhao1 2 †
- Pierluigi Bonello3
- Desheng Liu2 †
- 1Key Laboratory of Watershed Geographic Sciences, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, China
- 2Department of Geography, The Ohio State University, Columbus, OH 43210, U.S.A.
- 3Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, U.S.A.
Abstract
The recently emerged beech leaf disease (BLD) is causing the decline and death of American beech in North America. First observed in 2012 in northeast Ohio, U.S.A., BLD had been documented in 10 northeastern states and the Canadian province of Ontario as of July 2022. A foliar nematode has been implicated as the causal agent, along with some bacterial taxa. No effective treatments have been documented in the primary literature. Irrespective of potential treatments, prevention and prompt eradication (rapid responses) remain the most cost-effective approaches to the management of forest tree disease. For these approaches to be feasible, however, it is necessary to understand the factors that contribute to BLD spread and use them in estimation of risk. Here, we conducted an analysis of BLD risk across northern Ohio, western Pennsylvania, western New York, and northern West Virginia, U.S.A. In the absence of symptoms, an area cannot necessarily be deemed free of BLD (i.e., absence of BLD cannot be certain) due to its fast spread and the lag in symptom expression (latency) after infection. Therefore, we employed two widely used presence-only species distribution models (SDMs), one-class support vector machine (OCSVM), and maximum entropy (Maxent) to predict the spatial pattern of BLD risk based on BLD presence records and associated environmental variables. Our results show that both methods work well for BLD environmental risk modeling purposes, but Maxent outperforms OCSVM with respect to both the quantitative receiver operating characteristics (ROC) analysis and the qualitative evaluation of the spatial risk maps. Meanwhile, the Maxent model provides a quantification of variable contribution for different environmental factors, indicating that meteorological (isothermality and temperature seasonality) and land cover type (closed broadleaved deciduous forest) factors are likely key contributors to BLD distribution. Moreover, the future trajectories of BLD risk over our study area in the context of climate change were investigated by comparing the current and future risk maps obtained by Maxent. In addition to offering the ability to predict where the disease may spread next, our work contributes to the epidemiological characterization of BLD, providing new lines of investigation to improve ecological or silvicultural management. Furthermore, this study shows strong potential for extension of environmental risk mapping over the full American beech distribution range so that proactive management measures can be put in place. Similar approaches can be designed for other significant or emerging forest pest problems, contributing to overall management efficiency and efficacy.
Beech leaf disease (BLD) is an emerging forest tree disease that is currently causing decline and mortality of American beech trees (Fagus grandifolia) in eastern North America. BLD was first discovered in northeast Ohio, U.S.A., in 2012 (Ewing et al. 2019) and has now been found in nine other states—Connecticut, Maine, Massachusetts, Michigan, New Jersey, New York, Pennsylvania, Rhode Island, and Virginia, as well as Ontario, Canada (Kantor et al. 2022; Marra and LaMondia 2020). Moreover, BLD symptoms have been observed on other beech species used as ornamentals in the United States, such as European (F. sylvatica), oriental (F. orientalis), and Chinese (F. engleriana) beech (USDA Forest Service 2021). No effective treatments have been published to date.
BLD is now thought to be caused by a foliar nematode, Litylenchus crenatae ssp. mccannii (LCM) (Burke et al. 2020; Carta et al. 2020), although the disease may actually be caused by a species complex that includes bacterial as well as fungal associates (Ewing et al. 2021). BLD is expressed in a variety of symptoms that include, characteristically, banding patterns on the leaves as well as thickening leaves that eventually shrivel and fall off the tree prematurely (Ewing et al. 2019). Small trees can become completely defoliated and die (Reed et al. 2022), which hampers forest regeneration, but larger trees have also been found in very poor condition, leading to documented premature death in some cases.
Being a foliar nematode, it is thought that LCM benefits from conditions that allow for at least some free water to accumulate on the foliage. It would seem logical, therefore, that high relative humidity would be conducive to LCM. Its current distribution, flanking the Great Lakes region as well as areas generally close to the Atlantic Ocean, appears to support this assumption. In general, specific environmental variables, including landscape features, may be useful to identify and predict the most conducive environments and thus determine the risk of establishment at a spatial level. This would be extremely helpful because it would allow for more targeted biosurveillance and perhaps preventative measures going forward, since random surveys are very time and labor consuming and costly (Václavík et al. 2010). Furthermore, changes of environmental conditions under future climate change scenarios may exert influence on the suitable habitat for BLD, thereby affecting its spatial distribution. In other words, determining the spatial risk of BLD and its future trends, particularly over large geographic scales, is sorely needed and extremely urgent.
Species distribution models (SDMs) have been widely used to map the potential spatial distribution of a disease or invasive species based on their presence or absence records and the corresponding environmental conditions (Kelly et al. 2007; Meentemeyer et al. 2015; Xie et al. 2019). However, absence data are often unreliable or missing in many practical situations because it is difficult to know whether a species is actually present at a given location (Gomes et al. 2018). If a species does not show as present, especially with cryptic species such as pathogens, it does not necessarily mean it is absent from a given locale and certainly does not mean that it will not occur in the future. Moreover, even if absence data are available, they may not be reliable because of some interfering factors, such as surveying errors or inappropriate sampling seasons or regions (Anderson et al. 2003; Guo et al. 2005). For these reasons, and assuming presence-only data are reliable (Botella et al. 2020; Yackulic et al. 2013), presence-only SDMs are widely used for the characterization of species macro-distributions across different environmental and spatial ranges. Presence-only SDMs correlate species presence data and a set of environmental variables that likely influence the survival of the species to predict the potential geographic distribution of the species (Elith et al. 2020).
Since BLD is a type of tree disease with different phases and severities and has spread quickly after its first emergence (Ewing et al. 2019; Fearer et al. 2022), we can only confirm that a place has BLD but cannot be sure BLD is not present in a given location or that it would not appear in the future. Additionally, since beech trees affected by BLD may show no symptoms in the early stages (Fearer et al. 2022) and the symptoms of BLD generally show up rather later than when LCM infection occurs (Carta et al. 2020), absence data for BLD are not easily collected.
In light of the background mentioned above, this study aimed to estimate the spatial contours of current environmental risk of BLD in the northeastern United States, as well as future trends, taking advantage of a long period of BLD records since its first report. We opted for presence-only SDMs to build the relationships between BLD presence and environmental factors. Specifically, we adopted the two most commonly accepted presence-only SDM algorithms (i.e., one-class support vector machine [OCSVM] [Guo et al. 2005; Schölkopf et al. 2001] and maximum entropy [Maxent] [Jaynes 1982; Phillips et al. 2006]), which have been widely used in previous studies for similar purposes (Gomes et al. 2018; Phillips et al. 2006; Yackulic et al. 2013). BLD presence plots and meteorological and landscape data were used as inputs. Discrepancies between the outputs of OCSVM and Maxent and the contribution of different environmental predictors regarding BLD risk were also examined. We then estimated the future environmental risk of BLD based on projected climate change scenarios using the best risk model.
Materials and Methods
Study area
Our study site covers an area of 205,289 km2, centering at 41.74°N, 80.41°W, and including northern Ohio, western Pennsylvania, western New York, and northern West Virginia, U.S.A. (Fig. 1). We modeled the spatial risk of BLD in this aera because it is where BLD was recorded initially (particularly for northern Ohio with the first BLD records) and has presented significant BLD symptoms to date. The area is characterized by a temperate continental climate with four distinctive seasons.

Fig. 1. Spatial distribution of the beech leaf disease (BLD) presence plots (red dots) in A, northern Ohio; B, western Pennsylvania; C, western New York; and D, northern West Virginia, U.S.A., used in this study, layered with the state boundaries.
BLD presence records
A total of 263 BLD presence plots were continuously collected in our study area from 2015 to 2022. The field surveyors placed the plots randomly (follow the generalized random tessellation stratified survey design [Stevens and Olsen 2003]) in the typical habitat of beech trees and collected the data related to BLD within the plots. For a BLD plot, the collected information includes the sampling date, location (latitude, longitude), BLD symptom presence, and symptom severity. The size of the plots is generally 20 × 50 m, and some are smaller (e.g., 10 × 40 m) due to practical limitations (e.g., between a road and a creek). For each plot, the number of surveyed beech trees in each plot depends on the size of the plot and the actual beech numbers within the plot, which ranges from one to more than 100. The presence of BLD was confirmed by symptoms. Since the presence plots were recorded over a relatively long period of time by different field surveyors in local scales, there are some spatially clustered plots when viewing the spatial distribution of the plots over a large distribution range (e.g., the whole study area). Hence, we removed the clustered plots to reduce the effects of sampling biases on the modeling performance (Syfert et al. 2013). We were left with 156 plots using the systematic sampling method for sampling bias correction at a grid with six times coarser resolution than the resolution of the input grids (see the data preprocessing section; Fourcade et al. 2014). The spatial distribution of the adopted 156 BLD plots is shown in Figure 1.
Host species data
We used beech basal area (BBA) raster maps to represent the spatial distribution of beech trees. Information was extracted from the live tree basal area data for tree species in the contiguous United States (Wilson et al. 2013; https://www.fs.usda.gov/rds/archive/catalog/RDS-2013-0013). Vegetation phenology derived from moderate resolution imaging spectroradiometer (MODIS) imagery and raster data of relevant environmental parameters with extensive field plots of tree species basal area were integrated to map the spatial distribution of tree species abundance (Wilson et al. 2013). The BBA data have a spatial resolution of 250 m. The clipped BBA data covering our study area is shown in Figure 2. Note that the BBA data were used to constrain the selection of background points to areas that are known to have beech rather than modeling the spatial risk of BLD to avoid inadvertently developing a model for the presence of beech.

Fig. 2. Beech basal area (BBA) distribution in our study area. Reproduced from Wilson et al. (2013).
Meteorological data
The meteorological data from WorldClim (Fick and Hijmans 2017) were used for the BLD risk modeling process, including solar radiation (SRad), relative humidity (RH), and 19 standard bioclimatic variables (Table 1). Note the bioclimatic variables are derived from the monthly temperature and precipitation values, which have been widely used in SDM studies (Bazzato et al. 2021; Booth 2022; Yuan et al. 2015). These bioclimatic variables represent seasonality, annual trends, and extremes of the temperature and precipitation conditions (O’Donnell and Ignizio 2012). In addition, RH is derived from the WorldClim water vapor pressure and average temperature based on the Bolton equation (Bolton 1980). We used V2.1 WorldClim version data released in January 2020 (https://www.worldclim.org/data/bioclim.html).
Table 1. The 19 standard bioclimatic variables

Moreover, we analyzed the future trends of BLD risk in response to climate change based on future bioclimatic variables (https://www.worldclim.org/data/cmip6/cmip6climate.html). Specifically, the future trends were projected for two periods, 2021 to 2040 and 2041 to 2060, under four shared socio-economic pathways (SSPs), SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5, in the coupled model intercomparison projects phase 6 (CMIP6) featured by the intergovernmental panel on climate change (IPCC) sixth assessment report. SSP1, SSP2, SSP3, and SSP5 represent the sustainability, middle, regional rivalry, and fossil-fueled development road for future socio-economic development, respectively, with increasing greenhouse gas emissions and warming. Additionally, the adopted future climate data were derived based on the sixth version of the model for interdisciplinary research on climate (MIROC6) (Tatebe et al. 2019), which has been used in similar studies (Lee et al. 2022). Note that there are four types of pixel size for the meteorological data (10 min, 5 min, 2.5 min, and 30 s), and we selected the finest one (i.e., 30 s [about 1 km at the equator]).
Landscape data
To address the potential effects of landscapes on BLD (e.g., topography and land cover), we used aspect, slope (derived from digital elevation model), and land cover types (LC) as the explanatory variables. The elevation data covering our study area were clipped from the latest V3 (released on 5 August 2019) advanced spaceborne thermal emission and reflection radiometer (ASTER) global digital elevation model (GDEM), which were downloaded from NASA’s EarthData Search (https://search.earthdata.nasa.gov/search). The V3 GDEM raster data were produced in 30-m grids and with significantly improved quality over the previous release (Abrams et al. 2020). Moreover, we adopted the GlobCover 2009 (global land cover map; Arino et al. 2012) as our land cover variable, which was obtained from the European Space Agency GlobCover Portal (http://due.esrin.esa.int/page_globcover.php). We used the latest LC data V2.3, with a spatial resolution of 300 m, and 16 land cover classes in our study area (Fig. 3), which was generated by classifying medium resolution imaging spectrometer (MERIS) instrument full resolution surface reflectance imagery with supervised and unsupervised methods (Bontemps et al. 2011).

Fig. 3. The land cover classification map of our study area. Reproduced from Arino et al. (2012).
Overview of the methods
The workflow of the risk modeling process is shown in Figure 4 and included four steps: (i) preprocessing the multisource environmental data to select relatively independent variables and make them consistent in spatial scales and modeling contributions, (ii) spatial risk modeling based on the preprocessed data and a training set of BLD presence plots; (iii) evaluation of the BLD risk models based on a testing set of BLD plots and randomly selected background points (Phillips et al. 2006) with 10-fold cross-validation (CV) and independent validation; and (iv) prediction of BLD risk in the future and analysis of future trends in BLD risk distribution. For the spatial risk modeling, we opted for two widely used presence-only models, OCSVM and Maxent, using observed BLD presence as the dependent variable. Meteorological (SRad, RH, and BIOs) and landscape (LC, aspect, and slope) variables were used as predictors. Moreover, the contribution of the predictors to BLD presence was analyzed based on Maxent. Note that the future BLD risk maps were estimated based on the risk model with the best modeling performance.

Fig. 4. Schematics of the workflow of this study. Image resampling was employed to make the input raster data have the same grid size. The normalization and one-hot encoding steps were used to (i) eliminate the influence of dominant environmental variables and data outliers and (ii) convert categorical labels to binary codes for machine learning.
Data preprocessing
To make the spatial scales of the multisource data consistent, all the host species presence, meteorological, and landscape raster data were interpolated to the grid size of 250 m to match the spatial scale of the BBA data using the nearest neighbor interpolation method. Consequently, the final BLD spatial risk map is at a resolution of 250 m. Moreover, since the machine learning (ML) performance could be harmed by dominant variables and data outliers (Singh and Singh 2020), we normalized the continuous data (i.e., meteorological and topographic images) based on the z-score normalization method to make the magnitude of the continuous variables comparable. Additionally, land cover classification data were preprocessed by the one-hot encoding method for the risk modeling because it is a type of categorical data rather than continuous data. One-hot encoding converts categorical variables (i.e., label values) to binary codes that can be treated properly by ML models for better predictions. In addition to data preprocessing for consistency, we removed redundant bioclimatic variables based on multicollinearity analysis (Pradhan 2016; Yang et al. 2013). If the Pearson correlation coefficient of two bioclimatic variables was larger than 0.9, only the one with the larger contribution was kept in the model (see Table 1).
One-class support vector machine
The regular support vector machine (SVM) method (Cortes and Vapnik 1995) is a widely-used supervised ML algorithm for both classification and regression. When solving classification problems, SVM learns a linear decision function (hyperplane) that has the maximum distance (margin) to its nearest neighbors (support vectors). If the samples are not linearly separable in low-dimensional spaces, a kernel function (e.g., polynomial kernel, radial kernel) can be used to map the samples to high-dimensional spaces where they are separable, thereby finding a linear hyperplane that can separate the samples in high-dimensional spaces. On the other hand, support vector regression (SVR) aims to obtain a hyperplane that has the minimum distance to the samples (support vectors) that are beyond a predefined error margin.
The OCSVM method (Schölkopf et al. 2001) adopts a similar training strategy to SVM, but OCSVM separates samples from the origin rather than different classes in regular SVM (i.e., it uses an unsupervised approach because there is only one type of class available for training in OCSVM). Specifically, OCSVM builds a decision boundary (hyperplane) that has the maximum margin between the training dataset and the origin which takes the value +1 (regarded as presence in terms of BLD) in a small region that contains most of the training data and −1 elsewhere (regarded as absence in terms of BLD). The decision function of the OCSVM in this study can be written as:
where x represents a data point in a finite set x (training dataset), fi(x) is the ith environmental variable value at x, n is the number of environmental variables, φ denotes the mapping (kernel function) from low-to high-dimensional feature space, wi is the weight of fi(x), and b is the intercept. The unknowns w = (w1, w2, …, wn) and b can be calculated by maximizing the distance between the decision boundary and the origin. For details of the calculation of w and b, we refer readers to (Schölkopf et al. 2001). The signed distance of x to the separating hyperplane can be obtained as dx = wφ(x) + b, where positive and negative distances mean inner (presence) and outlier (absence) samples, respectively. Since what we aim for here is BLD risk (the probability distribution of BLD), we need to transform the signed distances to probabilities. Generally, the Platt scaling approach (Platt 1999) is used for such purposes, but it is only applicable for scenarios with both presence and absence data, which cannot be applied to the OCSVM in this study. Therefore, we used the cumulative probability used by Maxent (Phillips et al. 2006) to transform the signed distances to the value range of [0,1], which makes it compatible with the outputs of Maxent for direct comparison.
Maximum entropy
Maxent (Jaynes 1982) is a statistical model to make probability predictions for general purposes and is widely used for spatial risk/species modeling based on presence-only datasets (Gomes et al. 2018; Phillips et al. 2006). Maxent uses an exponential model for probability calculation and adopts the “maximum entropy principle” to approximate optimal probabilities of the spatial distribution of research targets, which ensures the distribution of the approximations have the maximum entropy under all known constraints (Jaynes 1957). That is, the trained model with the maximum entropy would be the optimal probability distribution model. The Maxent model in our study can be expressed as:
where P(x) is the probability distribution of the BLD risk at x, H(P(x)) is the entropy of P(x), and ln represents the natural log function. denotes the nonnegative approximation of P(x) with the maximum entropy. Alternatively, can be resolved by the maximum likelihood estimation:
where Zw(x) is a normalization factor that ensures the summation of to be 1, and exp represents the exponential function. Maxent can guarantee to converge to the global minima rather than the local minima for the log loss of equation 4 and thus lead to the optimal probability distribution model (Phillips et al. 2006).
Parameterization for the risk models
We implemented the OCSVM model through the packages in Python and adjusted the two key parameters (i.e., the Gaussian kernel parameter [γ] and the penalty parameter [ν]). γ determines the influence of a single training sample on the trained model. Larger γ would have lower influences. ν defines the upper bound on the fraction of outliers (training samples misclassified) and the lower bound on the fraction of support vectors (training samples used as support vectors). We optimized γ and ν as 0.8 and 0.5, respectively, based on a grid search technique, which conducted a traversal process for different combinations of γ (0.1 to 4.0) and ν (0.1 to 1.0), both at an interval of 0.1. For Maxent, we conducted the modeling based on the packages in R and fine-tuned the primary parameters (i.e., the prevalence [τ] and regularization coefficient [β]) as 0.5 and 2, respectively, to reduce the complexity of our species-specific model (Moreno-Amat et al. 2015).
Evaluations metrics for risk modeling
We first conducted a 10-fold CV to examine the model overfitting and determine the best parameter setting. Then, we randomly selected 80 and 20% of the BLD occurrence plots as training and testing sets for model construction and independent validation, respectively. Since true absences are difficult to obtain, the number of background locations across the study area should be large to represent all available environmental conditions for modeling evaluation purposes (Grimmett et al. 2020). Thus, we randomly collected 5,000 background points over the study area based on the practical guide (Merow et al. 2013) to evaluate the performance of the presence-only models. These background points were collected where beech trees exist (BBA > 0) to avoid modeling the distribution of beech presence.
We used the receiver operating characteristics (ROC) analysis (Manel et al. 2001) to evaluate the performance of the BLD risk models, which quantifies the model’s ability to discriminate between classes. A ROC curve is plotted by the true positive rate against the false positive rate at different thresholds. The area under the curve (AUC) represents how much a model can separate presences and absences in the study area. As a standard for assessing the performance of SDMs, AUC values range from zero to one, and the closer AUC is to one, the better the separability (Swets 1988). Furthermore, the importance of each environmental variable to the Maxent risk model was evaluated based on the percent contribution, permutation importance, and Jackknife test (Phillips et al. 2006). The larger the three metrics, the more important the variable.
Results
Risk modeling performance
The average AUC values of OCSVM and Maxent in the 10-fold CV scenario were 0.843 and 0.889, respectively. The CV results were therefore satisfying and did not indicate overfitting. For the independent validation, both algorithms performed better than the random prediction, with the ROC curve for Maxent presenting lower commission and omission errors than that for OCSVM (Fig. 5). The AUC values of OCSVM and Maxent were 0.829 and 0.884, respectively, indicating better separability with Maxent.

Fig. 5. Independent validation by the receiver operating characteristics (ROC)-area under the curve (AUC) analysis of one-class support vector machine (OCSVM) and Maxent applied to the testing set of beech leaf disease (BLD) plots.
Spatial patterns of BLD risk
Figure 6 shows the spatial patterns of BLD risk predicted by OCSVM and Maxent, respectively. The risk probability of BLD occurrence ranges from 0 to 100%. Basically, the intensity and coverage of the risk derived from OCSVM are higher and larger than the ones from Maxent. However, the OCSVM-based risk map shows high values for the areas with very sparse beeches (such as the areas in circled red in Fig. 6A), by comparison with the BBA data in Figure 2. In contrast, the Maxent-based risk map does not show high values in these areas (Fig. 6B). Since the areas with few beech trees would not normally be associated with high BLD risk, and considering the better performance of Maxent over OCSVM in the ROC-AUC analysis, we suggest that the risk map predicted by Maxent is more conservative and more reliable than that predicted by OCSVM. Furthermore, using multiple models enables the examination of the consistency of their performance, which would lead to more reliable results than a single model (Grimmett et al. 2020). For instance, for the areas that show high BLD risk in both OCSVM- and Maxent-derived risk maps (such as the areas in the black circles in Fig. 6), we can be more certain about the emerging risk of BLD in those locales.

Fig. 6. The predicted beech leaf disease (BLD) risk maps based on A, one-class support vector machine (OCSVM) and B, Maxent. The red circles mark the areas with sparse beech trees regarding the beech basal area (BBA) data, where the probabilities of BLD presence should be at a low level. The black circles and ellipses mark some areas that have high BLD presence probabilities identified by both OCSVM and Maxent.
Contributions of environmental variables to BLD
With respect to the percent contribution of each environmental variable, the top five variables of importance in the Maxent model were LC (30.7%), BIO3 (21.5%), BIO8 (9.4%), BIO4 (8.1%), and SRad (7.3%) (Fig. 7A). For the permutation importance of each variable, the top five variables were BIO13 (28.8%), BIO4 (16.6%), BIO18 (13.2%), BIO10 (10.1%), and LC (7.3%). Moreover, the Jackknife test shows that the top five variables were BIO3, BIO4, LC, BIO18, and BIO7 when only using a single variable for Maxent modeling (Fig. 7B). Additionally, the environmental variable that decreases the gain the most when it is omitted is LC, which therefore appears to have the most information that is not present in the other variables. Therefore, the two key drivers of BLD are meteorology (BIO3 and BIO4) and LC based on the analyses of the three metrics.

Fig. 7. The contributions of each environmental variable to the Maxent model. A, Percent contribution and permutation importance of each environmental variable. B, Jackknife of regularized training gain.
The specific relationships between BLD presence and key environmental variables are reported in Figure 8, with each response curve representing the probability obtained using only the corresponding variable. The portion of the curves corresponding to a probability of BLD presence greater than 0.5 (blue dashed lines in Fig. 8) indicate the most suitable environmental conditions for BLD. For BIO3 (i.e., isothermality), the suitable range for BLD occurrence (probability > 0.5) is from 28 to 30.5%. Since isothermality quantifies how large the day-to-night temperatures oscillate relative to the summer-to-winter oscillations (O’Donnell and Ignizio 2012), less day-to-night temperature oscillation would be favored more by BLD. For BIO4 (i.e., temperature seasonality [standard deviation × 100]), the suitable variation range for BLD presence ranges from 9.12 to 9.58°C. Since temperature seasonality represents the temperature variation over the averaged years based on the standard deviation of monthly temperature averages (O’Donnell and Ignizio 2012), a seasonal variation of ∼9°C would be conducive to the spread of BLD. For LC, the “closed broadleaved deciduous forest” is the most suitable land cover type for BLD presence. Note that the importance of the environmental variables was analyzed based on the Maxent model only because OCSVM does not embrace such functions. Additionally, the experimental results can be repeated based on the open and free environmental datasets.

Fig. 8. Response curves of beech leaf disease (BLD) presence in relation to the three important environmental variables. A, BIO3 (i.e., isothermality [%]); B, BIO4 (i.e., temperature seasonality [standard deviation × 100] [°C]); and C, land cover type (LC).
Future trends of BLD risk distribution
The differences between the current and future (2021 to 2040 and 2041 to 2060) risk maps predicted by Maxent under four future emission scenarios (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) are shown in Figure 9. It can be seen that the BLD risk may increase/decrease/remain unchanged across our study area in the future. All risk difference maps demonstrate similar spatial patterns but different degrees regarding BLD risk changes. Furthermore, the changes of BLD risk in the near future (Fig. 9A, B, C, and D, 2021 to 2040) are smaller than the changes in the far future (Fig. 9E, F, G, and H, 2041 to 2060). In addition, the BLD risk changes under SSP3-7.0 and SSP5-8.5 are larger than those under SSP1-2.6 and SSP2-4.5, with SSP3-7.0 showing the largest changes. This can be explained by the relatively small climate changes under SSP1-2.6 and SSP2-4.5 scenarios relative to the other two. Although SSP5-8.5 would lead to a larger degree of climate changes than SSP3-7.0, the significantly increased ambient temperature of SSP5-8.5 may not be suitable for the development of BLD.

Fig. 9. Differences between current and future beech leaf disease (BLD) risk maps. Future trends of BLD development during 2021 to 2040 under A, SSP1-2.6; B, SSP2-4.5; C, SSP3-7.0; and D, SSP5-8.5 and during 2041 to 2060 under E, SSP1-2.6; F, SSP2-4.5; G, SSP3-7.0; and H, SSP5-8.5.
Discussion
As a recently emerged forest disease, BLD has been drawing increasing attention in recent years, and many efforts have been put into the investigation of BLD in several aspects, such as its symptom patterns (Ewing et al. 2019; Fearer et al. 2022), causal agents (Burke et al. 2020; Carta et al. 2020; Ewing et al. 2021), and impacts of forest ecosystems (Reed et al. 2022). However, the spatial risk of BLD has not been modeled, which is critical due to its rapid spreading at large scales. This study represents the first attempt at mapping the risk of BLD in northern Ohio (where BLD was first identified), western Pennsylvania, western New York, and northern West Virginia, U.S.A., based on BLD presence plots and multiple environmental variables. The results show that both OCSVM and Maxent produce significantly better predictions than random selection and that Maxent is more conservative and at the same time outperforms OCSVM.
As a very popular presence-only SDM, Maxent has been frequently used in invasive species studies for risk mapping, such as determining the regional susceptibility to invasion by 16 major exotic plant taxa in southern United States forests (Lázaro-Lobo et al. 2021), predicting the potential distribution of invasion of Erigeron canadensis L. (Canada fleabane) in China (Yan et al. 2020) and modeling the climatic suitability for Prosopis juliflora (Swartz) DC in India (Singh et al. 2021). The same studies illustrated the importance of environmental variables to the modeling of invasive species distribution. In this study, the Maxent model indicates that temperature (isothermality and temperature seasonality) and land cover factors make relatively larger contributions to the distribution of BLD. Specifically, areas with closed (>40%) broadleaved deciduous forest, small day-to-night temperature differences (∼30%), and less seasonal temperature variations (∼9°C) are prone to BLD. It indicates that BLD may favor areas with host species and a relatively stable temperature. Possible BLD hotspots with these conditions could be northeast of Ohio and Pennsylvania (black circled areas in Fig. 6). Currently, the etiology of BLD has not been fully characterized (Burke et al. 2020; Carta et al. 2020; Ewing et al. 2021). However, even on the absence of complete understanding of the causal agents and effective treatments at the current stage, monitoring/management efforts could be targeted at these potential hotspots for early detection and mitigation purposes. This study could be helpful in providing early-warning information regarding BLD occurrence in these potential hotspots.
Since BLD occurrence is related to multiple climate variables in addition to the major contributors (Fig. 7), we adopted future climatic data to predict the future BLD risk maps and compared them with the current risk map to examine the combined influence of the contributors. This study obtained the spatial patterns of BLD risk changes in the future under different emission scenarios, but it only explored the impacts of climate changes in our study area. Future efforts would be made to estimate BLD risk in a larger geographical range and predict BLD future trend with the consideration of future land cover conditions. In addition, further improvement for the spatial resolution of the risk maps could be explored by utilizing image downscaling techniques (Zhao et al. 2018, 2021; Zhao and Liu 2022), which can benefit the research at finer scales than this study.
Our spatial risk modeling provides at least three significant benefits: (i) knowledge of likely BLD establishment, provided the pathogen is transported/vectored to that location, which facilitates targeted biosurveillance; (ii) information on significant environmental features that will lead to better definition of epidemiological models; and (iii) directions of BLD risk changes in the future caused by climate change. Importantly, this study provides a blueprint for mapping the risk of BLD across the distribution range of American beech, as well as other beech species across the globe, so that proactive management measures can be put in place. Ultimately, this will benefit all relevant stakeholders, including forest managers, local authorities and communities, and private actors.
Acknowledgments
We thank Caleb Kime for providing the BBA raster data. Moreover, we would like to thank the Cleveland Metroparks and The Ohio State University for providing the plot data.
The author(s) declare no conflict of interest.
Literature Cited
- 2020. ASTER global digital elevation model (GDEM) and ASTER global water body dataset (ASTWBD). Remote Sens. 12:1156. https://doi.org/10.3390/rs12071156 CrossrefWeb of ScienceGoogle Scholar
- 2003. Evaluating predictive models of species’ distributions: Criteria for selecting optimal models. Ecol. Model. 162:211‐232. https://doi.org/10.1016/S0304-3800(02)00349-6 CrossrefWeb of ScienceGoogle Scholar
- 2012. Global land cover map for 2009 (GlobCover 2009). European Space Agency and Université catholique de Louvain (UCL), PANGAEA. https://doi.org/10.1594/PANGAEA.787668 Google Scholar
- 2021. High spatial resolution bioclimatic variables to support ecological modelling in a Mediterranean biodiversity hotspot. Ecol. Model. 441:109354. https://doi.org/10.1016/j.ecolmodel.2020.109354 CrossrefWeb of ScienceGoogle Scholar
- 1980. The computation of equivalent potential temperature. Mon. Weather Rev. 108:1046‐1053. https://doi.org/10.1175/1520-0493(1980)108<1046:TCOEPT>2.0.CO;2 CrossrefWeb of ScienceGoogle Scholar
- 2011. GLOBCOVER 2009 products description and validation report. ESA Bull 136:10013. Google Scholar
- 2022. Checking bioclimatic variables that combine temperature and precipitation data before their use in species distribution models. Austral Ecol. 47:1506‐1514. https://doi.org/10.1111/aec.13234 CrossrefWeb of ScienceGoogle Scholar
- 2020. Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection. PLoS One 15:e0232078. https://doi.org/10.1371/journal.pone.0232078 CrossrefWeb of ScienceGoogle Scholar
- 2020. The emergence of beech leaf disease in Ohio: Probing the plant microbiome in search of the cause. For. Pathol. 50:e12579. https://doi.org/10.1111/efp.12579 CrossrefWeb of ScienceGoogle Scholar
- 2020. Beech leaf disease symptoms caused by newly recognized nematode subspecies Litylenchus crenatae mccannii (Anguinata) described from Fagus grandifolia in North America. For. Pathol. 50:e12580. https://doi.org/10.1111/efp.12580 CrossrefWeb of ScienceGoogle Scholar
- 1995. Support-vector networks. Machine Learn. 20:273‐297. https://doi.org/10.1007/BF00994018 CrossrefWeb of ScienceGoogle Scholar
- 2020. Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics. 15:69-80. https://doi.org/10.17161/bi.v15i2.13384 CrossrefWeb of ScienceGoogle Scholar
- 2019. Beech leaf disease: An emerging forest epidemic. For. Pathol. 49:e12488. https://doi.org/10.1111/efp.12488 CrossrefWeb of ScienceGoogle Scholar
- 2021. The foliar microbiome suggests that fungal and bacterial agents may be involved in the beech leaf disease pathosystem. Phytobiomes J. 5:335‐349. https://doi.org/10.1094/PBIOMES-12-20-0088-R LinkGoogle Scholar
- 2022. Monitoring foliar symptom expression in beech leaf disease through time. For. Pathol. 52:e12725. https://doi.org/10.1111/efp.12725 CrossrefWeb of ScienceGoogle Scholar
- 2017. WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37:4302‐4315. https://doi.org/10.1002/joc.5086 CrossrefWeb of ScienceGoogle Scholar
- 2014. Mapping species distributions with MAXENT using a geographically biased sample of presence data: A performance assessment of methods for correcting sampling bias. PLoS One 9:e97122. https://doi.org/10.1371/journal.pone.0097122 CrossrefWeb of ScienceGoogle Scholar
- 2018. Species distribution modelling: contrasting presence-only models with plot abundance data. Sci. Rep. 8:1003. https://doi.org/10.1038/s41598-017-18927-1 CrossrefWeb of ScienceGoogle Scholar
- 2020. Presence-only species distribution models are sensitive to sample prevalence: Evaluating models using spatial prediction stability and accuracy metrics. Ecol. Model. 431:109194. https://doi.org/10.1016/j.ecolmodel.2020.109194 CrossrefWeb of ScienceGoogle Scholar
- 2005. Support vector machines for predicting distribution of sudden oak death in California. Ecol. Model. 182:75‐90. https://doi.org/10.1016/j.ecolmodel.2004.07.012 CrossrefWeb of ScienceGoogle Scholar
- 1957. Information theory and statistical mechanics. Phys. Rev. 106:620. https://doi.org/10.1103/PhysRev.106.620 CrossrefWeb of ScienceGoogle Scholar
- 1982. On the rationale of maximum-entropy methods. Proc. IEEE 70:939‐952. https://doi.org/10.1109/PROC.1982.12425 CrossrefWeb of ScienceGoogle Scholar
- 2022. First report of beech leaf disease, caused by Litylenchus crenatae mccannii, on American Beech (Fagus grandifolia) in Virginia. Plant Dis. 106:1764. https://doi.org/10.1094/PDIS-08-21-1713-PDN LinkWeb of ScienceGoogle Scholar
- 2007. Modeling the risk for a new invasive forest disease in the United States: An evaluation of five environmental niche models. Comput. Environ. Urban Syst. 31:689‐710. https://doi.org/10.1016/j.compenvurbsys.2006.10.002 CrossrefWeb of ScienceGoogle Scholar
- 2021. Multivariate analysis of invasive plant species distributions in southern US forests. Landscape Ecol. 36:3539‐3555. https://doi.org/10.1007/s10980-021-01326-3 CrossrefWeb of ScienceGoogle Scholar
- 2022. Spatial evaluation of machine learning-based species distribution models for prediction of invasive ant species distribution. Appl. Sci. 12:10260. https://doi.org/10.3390/app122010260 CrossrefGoogle Scholar
- 2001. Evaluating presence–absence models in ecology: The need to account for prevalence. J. Appl. Ecol. 38:921‐931. https://doi.org/10.1046/j.1365-2664.2001.00647.x CrossrefWeb of ScienceGoogle Scholar
- 2020. First report of beech leaf disease, caused by the foliar nematode, Litylenchus crenatae mccannii, on American Beech (Fagus grandifolia) in Connecticut. Plant Dis. 104:2527. https://doi.org/10.1094/PDIS-02-20-0442-PDN LinkWeb of ScienceGoogle Scholar
- 2015. Citizen science helps predict risk of emerging infectious disease. Front. Ecol. Environ. 13:189‐194. https://doi.org/10.1890/140299 CrossrefWeb of ScienceGoogle Scholar
- 2013. A practical guide to MaxEnt for modeling species’ distributions: What it does, and why inputs and settings matter. Ecography 36:1058‐1069. https://doi.org/10.1111/j.1600-0587.2013.07872.x CrossrefWeb of ScienceGoogle Scholar
- 2015. Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data. Ecol. Model. 312:308‐317. https://doi.org/10.1016/j.ecolmodel.2015.05.035 CrossrefWeb of ScienceGoogle Scholar
- 2012. Bioclimatic predictors for supporting ecological applications in the conterminous United States. US Geol. Surv. Data Ser. 691:4‐9. Google Scholar
- 2006. Maximum entropy modeling of species geographic distributions. Ecol. Model. 190:231‐259. https://doi.org/10.1016/j.ecolmodel.2005.03.026 CrossrefWeb of ScienceGoogle Scholar
- 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10:61‐74. Google Scholar
- 2016. Strengthening MaxEnt modelling through screening of redundant explanatory bioclimatic variables with variance inflation factor analysis. Researcher 8:29‐34. Google Scholar
- 2022. The distribution of beech leaf disease and the causal agents of beech bark disease (Cryptoccocus fagisuga, Neonectria faginata, N. ditissima) in forests surrounding Lake Erie and future implications. For. Ecol. Manag. 503:119753. https://doi.org/10.1016/j.foreco.2021.119753 CrossrefWeb of ScienceGoogle Scholar
- 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13:1443‐1471. https://doi.org/10.1162/089976601750264965 CrossrefWeb of ScienceGoogle Scholar
- 2020. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97:105524. https://doi.org/10.1016/j.asoc.2019.105524 CrossrefWeb of ScienceGoogle Scholar
- 2021. Modeling potential hotspots of invasive Prosopis juliflora (Swartz) DC in India. Ecol. Inform. 64:101386. https://doi.org/10.1016/j.ecoinf.2021.101386 CrossrefWeb of ScienceGoogle Scholar
- 2003. Variance estimation for spatially balanced samples of environmental resources. Environmetrics 14:593‐610. https://doi.org/10.1002/env.606 CrossrefWeb of ScienceGoogle Scholar
- 1988. Measuring the accuracy of diagnostic systems. Science 240:1285‐1293. https://doi.org/10.1126/science.3287615 CrossrefWeb of ScienceGoogle Scholar
- 2013. The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models. PLoS One 8:e55158. https://doi.org/10.1371/journal.pone.0055158 CrossrefWeb of ScienceGoogle Scholar
- 2019. Description and basic evaluation of simulated mean state, internal variability, and climate sensitivity in MIROC6. Geosci. Model Develop. 12:2727‐2765. https://doi.org/10.5194/gmd-12-2727-2019 CrossrefWeb of ScienceGoogle Scholar
USDA Forest Service . 2021. Pest Alert: Beech Leaf Disease. http://www.dontmovefirewood.org/wp-content/uploads/2019/02/Beech-Leaf-Disease-Pest-Alert.pdf Google Scholar- 2010. Predicting potential and actual distribution of sudden oak death in Oregon: Prioritizing landscape contexts for early detection and eradication of disease outbreaks. For. Ecol. Manag. 260:1026‐1035. https://doi.org/10.1016/j.foreco.2010.06.026 CrossrefWeb of ScienceGoogle Scholar
- 2013. Live Tree Species Basal Area of the Contiguous United States (2000-2009). USDA Forest Service, Rocky Mountain Research Station, Newtown Square, PA. Google Scholar
- 2019. Prediction and analysis of the potential risk of sudden oak death in China. J. For. Res. 30:2357‐2366. https://doi.org/10.1007/s11676-018-0755-x CrossrefWeb of ScienceGoogle Scholar
- 2013. Presence‐only modelling using MAXENT: When can we trust the inferences? Methods Ecol. Evol. 4:236‐243. https://doi.org/10.1111/2041-210x.12004 CrossrefWeb of ScienceGoogle Scholar
- 2020. Predicting the potential distribution of an invasive species, Erigeron canadensis L., in China with a maximum entropy model. Glob. Ecol. Conserv. 21:e00822. https://doi.org/10.1016/j.gecco.2019.e00822 Web of ScienceGoogle Scholar
- 2013. Maxent modeling for predicting the potential distribution of medicinal plant, Justicia adhatoda L. in Lesser Himalayan foothills. Ecol. Eng. 51:83‐87. https://doi.org/10.1016/j.ecoleng.2012.12.004 CrossrefWeb of ScienceGoogle Scholar
- 2015. Maxent modeling for predicting the potential distribution of Sanghuang, an important group of medicinal fungi in China. Fungal Ecol. 17:140‐145. https://doi.org/10.1016/j.funeco.2015.06.001 CrossrefWeb of ScienceGoogle Scholar
- 2021. A sparse representation-based fusion model for improving daily MODIS C6.1 aerosol products on a 3 km grid. Int. J. Remote Sens. 42:1077‐1095. https://doi.org/10.1080/01431161.2020.1823040 CrossrefWeb of ScienceGoogle Scholar
- 2018. A robust adaptive spatial and temporal image fusion model for complex land surface changes. Remote Sens. Environ. 208:42‐62. https://doi.org/10.1016/j.rse.2018.02.009 CrossrefWeb of ScienceGoogle Scholar
- 2022. A robust and adaptive spatial-spectral fusion model for PlanetScope and Sentinel-2 imagery. GIScience Remote Sens. 59:520‐546. https://doi.org/10.1080/15481603.2022.2036054 CrossrefWeb of ScienceGoogle Scholar
Funding: This work was funded by a Center for Applied Plant Sciences, The Ohio State University, grant to Pierluigi Bonello and Desheng Liu and by other state and federal funds appropriated to The Ohio State University, College of Food, Agricultural, and Environmental Sciences, Ohio Agricultural Research and Development Center. This work was also funded by the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences.
The author(s) declare no conflict of interest.