Developing political-ecological theory: The need for many-task computing

Timothy Haas

doi:10.1371/journal.pone.0226861

Abstract

Models of political-ecological systems can inform policies for managing ecosystems that contain endangered species. To increase the credibility of these models, massive computation is needed to statistically estimate the model’s parameters, compute confidence intervals for these parameters, determine the model’s prediction error rate, and assess its sensitivity to parameter misspecification. To meet this statistical and computational challenge, this article delivers statistical algorithms and a method for constructing ecosystem management plans that are coded as distributed computing applications. These applications can run on cluster computers, the cloud, or a collection of in-house workstations. This downloadable code is used to address the challenge of conserving the East African cheetah (Acinonyx jubatus). This demonstration means that the new standard of credibility that any political-ecological model needs to meet is the one given herein.

Citation: Haas T (2020) Developing political-ecological theory: The need for many-task computing. PLoS ONE 15(11): e0226861. https://doi.org/10.1371/journal.pone.0226861

Editor: Andrew Lewis, Griffith University, AUSTRALIA

Received: December 7, 2019; Accepted: October 29, 2020; Published: November 24, 2020

Copyright: © 2020 Timothy Haas. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

There is a need to acknowledge the complexity of political-ecological systems and the significant challenges to building theories of them [1–3]. Such systems lie at the interface between social/political science and ecology. The complexity of each of these fields coupled with an additional layer of complexity introduced by the interactions between sociological/political systems and natural systems can result in highly complex system dynamics, i.e., ones that are stiff, nonlinear, and possess feedback loops. For example, Schoon and Van der Leeuw [4] note that systems composed of interacting sociological and ecological subsystems are quick to change and rarely stay in equilibrium for long. Further, many state variables are needed to describe both the decision making processes of the relevant social groups, and the functioning of the involved ecosystem. A political-ecological system is also referred to as a socio-ecological system or social-ecological system (e.g., see [5]). The former term is emphasized herein because those political actions and processes that drive social movements are often initiated by groups seeking to gain increased political power [6]. The decline in the planet’s biodiversity [7], creates a need for credible political-ecological theory to guide the development of sustainable biodiversity conservation policies.

In addition to the challenge of building political-ecological theory, there is a deeper problem with using such models to guide ecosystem management policy: Unless such a model is shown to be credible, any policy recommendations based on output from the model may receive only mixed acceptance by those affected. As argued in [8], there is a need for a common model credibility standard to be met before the output of a model of a political-ecological system is deemed to be policy-relevant. This is because there may be skepticism towards models that have not had their parameters statistically estimated nor their parameter sensitivities assessed [9, 10]. These skeptics may be unwilling to cooperate with efforts to implement ecosystem management policies that are based in-part on output from these unassessed models.

But what is a credible model? Patterson and Whelan [11] state that “Model credibility is about the willingness of people to make decisions based on the predictions from the model.” In other words, a model is credible when a decision maker places enough trust in its predictions to use those predictions to select management actions. Call the model’s behavior, functioning, relationships, and systems of equations, its collective mechanism. Patterson and Whelan [11] believe the decision maker’s trust is won if (a) the model’s mechanism is based on known principles that govern the phenomenon being modeled; (b) all aspects of the model’s mechanism are testable, i.e., there are observable variables in the model on which data may be collected and used to conduct statistical hypothesis tests of the presence of these behaviors in the real world; and (c) the out-of-sample prediction error of the model’s predictions is below the decision maker’s threshold.

To make the assessment of a political-ecological model’s credibility easier to perform, this article develops and demonstrates an integrated suite of statistical methods for assessing model credibility components (b) and (c), above. Some of the hypotheses of component (b) may concern the sensitivity of the model to perturbations to its parameters. The testing of such hypotheses is typically referred to as performing a sensitivity analysis.

For the remainder of this article, the term “model validation” will not be used because in this author’s opinion, it is too ambiguous a term to support a consensus about whether a valid model can be established at all, let alone how it might be quantitatively assessed (see [12] and [13]).

An agent-based model consists of a collection of entities that make a sequence of decisions through time based on their goals and inputs from other agents. An ABM is often built to model a social system that is too complex to represent using mathematical models [14]. In ecology, the word “agent” is often replaced with the word “individual” to emphasize that the entities are individual flora or fauna whose behavior is more genetically defined rather than being based on a belief system such as utility maximization. As the authors of [15] state, individual-based models (IBMs) “explicitly represent discrete individuals within an (ecological) population and their individual life cycles.” One approach to modeling a political-ecological system is with a combination of an ABM to capture the system’s anthropogenic actions, and an IBM to capture the dynamics of the affected ecosystem. These two submodels interact with each other in order to capture the effects of actions taken by groups of humans that affect the ecosystem—and the feedback effects from the ecosystem back to those groups.

For example, Haas and Ferreira [16] build an economic-ecological model of the rhinoceros (Ceratotherium simum) horn trafficking system. This model contains submodels (agents) of rhino horn consumers, rhino poachers, and those antipoaching units attempting to stop the poachers. These latter two submodels interact with an IBM of the rhino population being illegally harvested. Haas and Ferreira [17] extend the poachers group submodel of this ABM-IBM model by adding a mechanism that explains how these individuals weigh the risk of being prosecuted for poaching against its profit potential. These authors then use this submodel to evaluate the practicality of policies aimed at providing employment opportunities for rhino poachers versus policies that intensify the enforcement of anti-poaching laws. This ABM-IBM model contains several hundred parameters.

Simulating a political-ecological system

Definition: A political-ecological system simulator (hereafter simulator) is an executable computer program capable of approximating the outputs of a stochastic model of a political-ecological system.

Such a simulator is part of an ecosystem management tool (EMT) developed by Haas [8]. An EMT is used to find politically feasible and effective policies for managing at-risk ecosystems. In this simulator, influence diagrams (IDs) (see [18]) are used to implement submodels for group decision making, and ecosystem functioning. For instance, the political-ecological system models of Haas and Ferreira [16, 17, 19] are computationally implemented through their attendant simulators.

This article’s central argument is that for simulators to effectively contribute to the development of political-ecological theory and ecosystem management policies, the following three activities need to be performed in sequence: (1) statistically fitting the simulator’s parameters to data sets of political-ecological actions [20], (2) assessing the credibility of this fitted simulator, and (3) running computations on this (now) credible simulator to find politically feasible and sustainable ecosystem management policies.

Addressing the computational challenge

Call one execution of a command to statistically estimate the parameters of a model, a job (see [21] and [22]). Generalizing this idea, let a simulator job refer to one execution of the computations needed to either (1) statistically estimate the parameters of a political-ecological system simulator; (2) compute parameter confidence intervals; (3) compute a measure of a simulator’s prediction error rate; (4) perform a sensitivity analysis; or (5) find, using the simulator, an ecosystem management policy. These five simulator jobs are integrated in that the first two jobs share the same estimator, the fourth job needs the confidence intervals found in the second job, and the fifth job uses the fitted model that was found by the first job.

Simulator jobs can require large amounts of computer time. From now on, however, the use of policy-relevant statistical and optimization methods will be possible only if the attendant computational challenges are met. Hence, any discussion or evaluation of such methods is inseparable from a consideration of their computational implementations.

But the need for large amounts of computer time can become a challenge for those scientists, government agencies, and NGOs needing to run such computations. Hereafter, call these groups and individuals who are involved in biodiversity protection, ecosystem managers. The handicap these managers face is that funding to support the active management of ecosystems can be uneven. For example, circa 2017-2020, the United States Environmental Protection Agency (USEPA) is being down-sized by President Trump’s administration [23]. But managing an ecosystem with the goal of conserving its biodiversity requires an on-going analysis of monitoring data as it arrives in order to guide the development of management actions that, when implemented, result in successful biodiversity outcomes. This means that ecosystem managers need to have alternative computing options should they be temporarily unable to afford supercomputer time from an external high performance computing (HPC) provider.

This article argues that a practical way to meet this computational challenge is to implement these jobs as many-task computing (MTC) applications. The authors of [24] describe such jobs as being made up of a collection of within-job computations, called tasks that are loosely coupled, communication-intensive, and heterogeneous. Several application program interfaces (APIs) that can be used to implement such jobs are described below, and one, JavaSpaces^™ (see [25]) is demonstrated through a case study.

Article contributions

This article makes three contributions to the development of political-ecological theory and the use of such theory in the formation of ecosystem management policies:

the first integrated suite of statistical measures for performing parameter estimation and credibility assessment of a political-ecological model and its attendant simulator,
a new method for constructing politically feasible and sustainable ecosystem management policies, and
downloadable software for implementing these methods as MTC applications via the JavaSpaces API.

Related work

Models, estimation, and sensitivity analysis

In a highly cited article, Macy and Willer [26] discuss how ABMs can advance sociological theory. Conte and Paolucci [27] note the potential that ABMs have for social science theory construction.

Methods exist for the statistical estimation of a socio-ecological model’s parameters [17, 28]—and for the estimation of a deterministic ecological model [29–31]. Minimum simulated distance estimators (MSDEs) are one family of parameter estimators that can be used to estimate the parameters of a stochastic ecosystem model. And one way to define the needed distance function is with the Hellinger distance [32, 33]. For example, in [28], a Hellinger distance-based MSDE is used to estimate the parameters of a stochastic, dynamic model of a political-ecological system.

A model is sensitive to a set of parameters if small perturbations to their values significantly affect the model’s outputs. For instance, the authors of [34] perform a probabilistic sensitivity analysis [35] of a salmon population dynamics model. And in [36], a probabilistic sensitivity analysis of an agricultural model is performed.

Integrated statistical assessment of a socio-ecological model’s credibility

A literature search uncovered no articles describing an integrated statistical assessment of a socio-ecological model’s credibility. In [37], however, a specific suite of activities is given for statistically assessing an ecosystem model’s credibility. These authors believe the evaluation of an ecosystem model should include (1) an interrogation of the model’s logic to determine whether it is parsimonious and biologically realistic; (2) a statistical estimate of its parameters; (3) estimates of its prediction accuracy; (4) computation of statistical goodness-of-fit tests; and (5) a probabilistic sensitivity analysis. These authors, however, do not apply their recommendations to a case study.

Yarkoni and Westfall [38] call for a shift in focus from building models that pass in-sample goodness-of-fit (GOF) tests towards the building of models that have low prediction error rates (out-of-sample performance). This is particularly true for models that are used to guide decisions aimed at changing the future behavior of a system (out-of-sample). A political-ecological system is, in-part, a model of how humans behave and hence, the focus on prediction for psychological models as advocated by Yarkoni and Westfall applies to political-ecological models.

Materials and methods

First, the procedure for using the EMT is given. This is followed by the statistical theory underpinning each simulator job. The section concludes with algorithms and runtime issues particular to the casting of simulator jobs as MTC applications.

EMT procedure

The three activities of statistically fitting a simulator, assessing its credibility, and using it to find politically feasible and ecologically effective policies form part of a step-by-step procedure given in [8, pp. 77-78] for using the EMT. A new version of this procedure follows.

Step 1:. Identify the spatial boundaries of the ecosystem to be managed. Typically, this ecosystem will host one or more endangered species.
Step 2:. Identify those political groups that directly or indirectly affect this ecosystem. Construct submodels of these groups by casting them as IDs and expressing them in the id language. This language is part of the id software system (see [39]). Use theories of cognitive processing to assign hypothesis values to the parameters of these submodels. Load these values into hypothesis parameter files—one file for each group.
Step 3:. Construct a population dynamics submodel of all species identified in Step 1. Cast this submodel as an ID and express it in the id language. Use ecological theory to identify hypothesis values for the parameters of this submodel. Load these values into a hypothesis parameter file.
Step 4:. Using all of the above files, create a master file that defines the political-ecological system simulator.
Step 5:. Acquire a data set of political-ecological actions made by some of the groups modeled in Step 2, and the ecosystem modeled in Step 3. The ecological component of this data set might consist of observations on the spatio-temporal abundance of several species.
Step 6:. Use id to statistically fit some subset of the simulator’s parameters to this data set using consistency analysis (see [28], and [8, pp. 46-52]).
Step 7:. Use id to compute jackknife confidence intervals for the parameters estimated in Step 6.
Step 8:. Conduct an analysis of the simulator’s credibility (see [8, pp. 179-198]) by using id to perform the two separate jobs of (a) estimating the simulator’s prediction error rate through computation of its one-step-ahead prediction error rates; and (b) performing a deterministic sensitivity analysis using thresholds defined by the parameter confidence intervals found in Step 7. If the simulator displays error rates that are no better than blind guessing (all options in each group submodel are equally likely), or it displays unacceptable sensitivity to some of its parameters, re-formulate one or more of the simulator’s submodels and go back to Step 6. Continue in this manner until the simulator is credible.
Step 9:. Use id to run a job with this (now) credible simulator to construct the most practical ecosystem management plan (MPEMP) (see [8, pp. 52-53]).
Step 10:. Implement this MPEMP in the real world.
Step 11:. As new data becomes available, repeat Steps 6 through 10.

Statistical estimation of simulator parameters

The consistency analysis statistical estimator delivers parameter estimates that result in the simulator’s probability distributions on its output variables being as similar as possible to empirical distributions derived from data while at the same time being as close as possible to those derived from political-ecological theory. Consistency analysis is a parameter estimator that is related to MSDE.

Hellinger distance.

Following [28, Appendix], and [17, S3 Appendix], one way to define the distance between two multivariate probability distributions is as follows. Partition a vector of p random variables, U into U^(d), and U^(ac)—the vectors of discrete and absolutely continuous random variables, respectively. Say there are d discrete members of U, and c continuous members. Hence, p ≡ d + c. Let the probability density probability function (PDPF) be (1)

Let U|β notate the random vector whose PDPF is parameterized by the components of β. For example, an ID might be composed of U₁ ∼ Bernoulli(β₁) and U₂ ∼ Normal(β₂ + u₁ β₃, β₄). The graph of this ID appears in Fig 1, and its parameter vector, β = (β₁, β₂, β₃, β₄).

Download:

Fig 1. The graph of the ID wherein U₁ influences U₂ and both of these nodes are stochastic (indicated by circles).

https://doi.org/10.1371/journal.pone.0226861.g001

In terms of the PDPF, the Hellinger distance between two probability distributions is (2) and is bounded between 0 and 1 [40].

Consistency analysis.

Haas and Ferreira [17] give a description of consistency analysis before applying it to a model of the political-ecological system of rhino horn trafficking. An abbreviated version of this description appears here.

Let m be the number of interacting IDs in a political-ecological simulator. Let U_i be the vector that contains all of the chance nodes that make up the i^th ID (either one of the group submodels or the ecosystem submodel). Let U|β_(ij) be the i^th ID’s probability distribution parameterized by the entries in β_(ij) under the j^th set of conditioning (input) node values. Each parameter in the ID is assigned a point value a-priori that is derived from either expert opinion, subject matter theory, or the results of a previous consistency analysis. Collect all of these hypothesis values into the hypothesis parameter vector, . This vector holds the ecosystem manager’s prior beliefs about the true values of the model’s parameters.

Let l_i be the number of belief networks formed by conditioning the i^th ID on all possible combinations of its input nodes. There are m − 1 group submodels, and one ecosystem submodel. Define i.e., those parameters that identify all of the group submodels, those that identify the ecosystem submodel, and the collection of all of the model’s parameters, respectively.

As in [8, pp. 17-18], for group submodels, let an in-combination be a set of values on the input nodes {time, input action, actor, subject}. Let an out-combination be a set of values on the input nodes {output action, target (of that action)}. A group ID selects an out-combination by computing the expected value of its terminal node, Overall Goal Attainment under the received (given) in-combination—and each possible combination of values on the two input nodes of Out-Action and Target. The out-combination that maximizes this expected value is selected for output.

Let an in-out pair consist of an in-combination—out-combination pair. Let T be the number of time points at which out-combinations are observed, and (m_O ≤ m) be the set of indices of those group submodels for which at least one out-combination is observed over the observation time interval: [t₁, t_T].

Each of the e output nodes of the ecosystem submodel is stochastic and corresponds to an observable ecosystem metric. A run of the simulator produces a set of simulated values on each output node at each time point. The mean of these values is an estimate of that node’s expected value at that time point.

Let be a goodness-of-fit statistic that measures the agreement of a sequence of out-combinations and/or mean values of ecosystem metrics produced by a simulator and those of a political-ecological actions data set, S of observed output actions and/or observations on the ecosystem submodel’s metrics. Larger values of indicate better agreement. Let be a measure of agreement between the probability distribution on the model’s vector of output nodes that is identified by , and the one identified by . Again, larger values of indicate better agreement. Note that is the agreement between a sample and a stochastic model, while is the agreement between two stochastic models.

A consistency analysis is executed with the following four steps.

Specify the values for .
Initialize the model’s parameter values by modifying to form .
Maximize the agreement function, by modifying the values of to form the vector of consistent parameter values, .
Analyze the differences in parameter values between those in , and those in .

The estimator’s name comes from this final step: analyze the model’s parameters by scrutinizing areas of the subject matter theory that had been used to justify those hypothesis parameter values that, surprisingly, have been found to be very different from their consistent values.

The Maximize step of consistency analysis consists of solving (3) where , and c_H ∈ (0, 1) is the ecosystem manager’s priority of having the estimated distribution agree with the hypothesis distribution as opposed to agreeing with the empirical distribution. Setting c_H to zero turns consistency analysis into an MSDE. The subjective assignment of c_H in consistency analysis coupled with its role in the solution of (3) is how consistency analysis represents the reliability of the new data.

The agreement between the simulator’s hypothesis distributions and the distributions defined by is where (4) and the estimated Hellinger distance between U|β_H and U|β is (5)

In this estimator, values of the PDPF under an ID’s hypothesis distribution, U|β_H and its U|β distribution are approximated by first drawing a size-n sample of design points from a multivariate uniform distribution on the ID’s chance nodes: u₁,…,u_n; and then computing , i = 1, …, n with a nonparametric density estimator.

The agreement between observed output actions and those generated by the simulator is (6) where y_{i_k j} is the observed action of group i_k at time j, and d_{i_k j} is the submodel-computed action of group i_k at time j. Let S_i ≡ {z_i1, …, z_iT} be the T observations on the i^th ecosystem metric. The agreement between observed outputs of the ecosystem and those generated by the ecosystem submodel is (7) where R_i ≡ max(S_i) − min(S_i). These latter two agreement functions form the overall data agreement function: .

Delete-d jackknife confidence intervals

The deterministic sensitivity analysis described in the next section assumes that confidence intervals for each parameter in are available. One way to find these is to compute delete-d jackknife confidence intervals (see [41]). Haas [42] gives an algorithm for computing a delete-d jackknife confidence interval. This algorithm proceeds as follows.

Resample r = n^0.97 observations from the observed size-n sample. In other words, temporarily delete d ≡ n − r observations from the observed sample.
With this size-r subsample, compute , the consistency analysis estimate of the parameter, β.
Repeat Steps 1 and 2 n_jack times to obtain .
Form a 100(1 − α)% confidence interval for β by finding the shortest interval that contains (1 − α)n_jack of these values.

Confidence intervals based on delete-d subsamples are consistent if, as r → ∞, r/n → 0 [43]. One way to meet this condition is to have r = n^τ where τ ∈ (0, Â 1).

Prediction error rates

The simulator’s group submodels produce nominally-valued output in the form of out-combinations. The ecosystem submodel on the other hand, can produce continuously-valued output, e.g. wildlife abundance values. Two different measures of prediction error rate then, are needed. Here, these are the predicted actions error rate (ζ) for action-target output, and the root mean squared prediction error rate (ϵ_i) for the i^th continuously-valued ecosystem metric [8, pp. 186-188].

Predicted actions error rate.

Consider a finite number of sequential time points, t₁, …, t_T. At each of these time points, one or more of the simulator’s group submodels posts one or more out-combinations. Let (8) where is the number of simulator-predicted out-combinations at time point t_{i+ 1} that match observed out-combinations at that time point, and is the number of these observed out-combinations. It is assumed that the simulator’s parameters have been refitted to the political-ecological actions data set using data observed earlier than time point t_i+1. The justification for this assumption is that an ecosystem manager would want to refit the simulator as new actions and/or values on ecosystem metrics are observed before using the simulator to predict future group actions and/or future values of ecosystem metrics.

Say that a group submodel has K possible out-combinations. In the worst case, one of these out-combinations has a high probability of being chosen at each time point no matter what the input action is. Blind guessing would predict this out-combination with probability 1/K at each time point resulting in an error rate of about 1 − 1/K. An ecosystem manager would prefer the simulator’s predictions over predictions based on blind guessing whenever ζ < 1 − 1/K.

Root mean squared prediction error rate.

Let (9) where is the observed value of the i^th continuously-valued ecosystem metric at time point t_{j+ 1}, and is the simulator’s predicted value of this metric at time point t_j+1 where the ecosystem submodel has been fitted to data earlier than time point t_j+1. Define an alternative predictor, namely the naive forecast to be . Let δ_i be the RMSE of these naive forecasts.

Error rate estimation.

To estimate these error rates, begin at time point t_s, s > 0. Then, perform the following two computations at each of the time points where v > 0 is the refit interval, n_pred ≡ ⌊(T_D − 1 − s)/v⌋ + 1, , and T_D is the most recent time point in the data set.

Re-fit the simulator with consistency analysis using all observed out-combinations up through time t_j.
Run this refitted simulator from the first time point in the data set up through time point t_{j+ 1} to compute predicted values of all output nodes.

With these predictions in-hand, compute an estimate of ζ with (10)

Estimate ϵ_i, and δ_i with (11) and (12) respectively.

Note that the simulator is refitted every v time units. Typically, time is measured in years. An ecosystem manager would be constrained by analyst time, computer availability, and data acquisition frequency. A typical refit time interval is quarterly, i.e., v = (4 × 3)/52 = 0.2308.

If is greater than , the naive forecast is preferred over the model’s predictions. In this case, the ecosystem manager would be advised to work on refining and/or modifying the model until is less than .

Deterministic sensitivity analysis

Deterministic sensitivity analysis assesses the sensitivity of a model’s outputs to externally-generated values of the model’s inputs (see [44]). Haas [8, pp. 182-183] gives an algorithm for studying a simulator’s deterministic sensitivity. A new version of this algorithm follows.

Conditions and responses.

Input for this algorithm consists of a set of DSA conditions, c_DSA, and a set of DSA responses, r_DSA. Each of these sets contains values on simulator submodel output nodes. These values can be those of nominally-valued output action nodes, or of continuously-valued ecosystem submodel nodes. Refer to any actions in either of these sets that are to not happen as complement actions. A particular pair of these sets embodies a counter-example to the types of simulator outputs that the ecosystem manager is hoping to achieve. Typically, a critic or skeptic of the simulator would specify these sets.

Algorithm.

Update to the most recent value of .
Specify c_DSA, and r_DSA and set the simulator’s time interval accordingly.
Place all actions contained in either c_DSA or r_DSA into a file of “observed” actions, and all ecosystem responses contained in r_DSA into a file of “observed” ecosystem outputs.
Initialize so that the simulator produces all actions contained in c_DSA and r_DSA but does not produce any complement actions contained in these sets.
After setting c_H to 0.1, solve for by performing the consistency analysis Maximize step (see (3)) using the two files formed in Step 3.
Compute .

Interpretation.

The parameter β^(l) is the most sensitive parameter, and the difference, is the accuracy to which this parameter needs to be known. If is inside the 95% confidence interval for β^(l) (see the EMT procedure, Step 7), or is a scientifically plausible value for β^(l), conclude that this analysis supports the skeptic’s concerns about the simulator’s sensitivity to parameter misspecification.

The idea of this algorithm is to search for a set of parameter values that is as close to as possible but causes the simulator’s outputs to change by an amount that is scientifically significant. If the values in are not statistically different from their consistent counterparts or, are scientifically plausible, then the model’s outputs are excessively sensitive to parameter misspecification. This sensitivity in-turn, reduces the credibility of policy recommendations derived from the model’s outputs.

Ecosystem management policymaking

Computing the MPEMP is one way to construct an ecosystem management policy. The algorithm described herein is new. Its development was motivated by earlier algorithms given in [8, pp. 52-53], and [17, S5 Appendix]. The idea is to find a set of minimal changes in the beliefs held by ecosystem-affecting groups (relative to their values) so that these groups change their behaviors enough to cause the ecosystem to respond in a desired manner. In other words, the MPEMP is the ecosystem management policy that emerges by finding group submodel parameter values that bring the predicted ecosystem state close to the desired ecosystem state while deviating minimally from .

Definitions.

Let be a random vector composed of a number of the simulator’s ecosystem metrics. For example, might consist of cheetah abundance, and herbivore abundance in the year 2030. Assume that an ecosystem manager desires the ecosystem to be in a particular state at a designated future time point. This manager expresses this desired state by specifying the value of . For example, say that it is desired to have 10,000 herbivores and 1,000 cheetah in East Africa in the year 2030. Then (13)

Next, identify those actions that, if taken, would contribute the most towards the ecosystem submodel producing the values in q_d. And, identify those actions that, if ceased, would raise the likelihood of the ecosystem submodel producing the values in q_d. Collect all of these desirable and undesirable actions into a set called c_MPEMP. For example, to achieve these desired values, it is believed that more land should be set aside for wildlife reserves, and poaching should cease. In this case, (14) where kep, and krr are the Kenya environmental protection agency, and Kenya rural residents groups, respectively.

MPEMP algorithm.

Update to the most recent .
Compute .
Specify q_d and c_MPEMP.
Compute initial values for with the Initialize algorithm of consistency analysis (see Materials and methods: Consistency analysis).
Compute (15) under the set of constraints specified by c_MPEMP.

This algorithm implements one way to quantify the concept of a practical ecosystem management policy: Associate political feasibility with the value of where contains the parameters of the decision making submodels whose values have been modified from those in in such a way that now, the sequence of output actions taken by different groups cause a desired ecosystem state at a designated future time point.

A measure of a plan’s political feasibility can be defined as (16)

A plan having a value of ψ close to 0.0 will face significant political resistance to its implementation because significant changes to the belief systems of one or more groups needs to happen, while one with a value close to 1.0 should not face such stiff resistance.

Coding simulator jobs as MTC applications

These five simulator jobs can be computationally expensive. These jobs can, however, be partially parallelized by breaking each of them into sets of dependent tasks that engage in various amounts of data transfer between themselves. Such a set of complex, inter-dependent tasks fits the definition of an MTC application. One way to execute MTC applications is to run them on cluster computers [24, 45]. A cluster computer consists of a number of personal computers called compute nodes (hereafter, nodes) that are connected through high speed interconnects.

Translating the mathematical expressions of Materials and methods: Statistical estimation of simulator parameters into a programming language is performed by writing code within an API that supports the development of task-based parallel programs. A runtime system is invoked to execute such programs on hardware. The authors of [46] review APIs and runtime systems that are designed to support MTC applications. These authors refer to a particular combination of an API and a runtime system as a task-based parallelism technology.

As identified in [46], an ideal API should be able to direct the runtime system to partition, synchronize, and cancel tasks; specify nodes for workers to run on; start/stop workers; receive task or process fault information; checkpoint a job should a nonrecoverable fault occur; and automatically distribute data and code to workers. In addition, the present author believes that in order to bring many-task computing within reach of ecosystem managers possessing only minimal programming skill, the API should be easy to learn, and use operators whose syntax and semantics are independent of specific runtime systems and hardware configurations.

Therefore, to enable ecosystem managers with different backgrounds to use the five simulator jobs advocated in this article, a task-based parallelism technology needs to possess the following characteristics:

Exhibit a high level of abstraction.
Be easy to learn.
Support the asynchronous, high-level coordination of simultaneous tasks.
Separate the communication protocol from the application code.
Be internet-aware.
Be fault-tolerant: Processor failure is almost certain during a job that employs thousands of processors [47]. Such tolerance implies the ability to automatically checkpoint a job.
Be scalable: Only one code need be written and maintained to run jobs on hardware ranging from laptop computers to cluster computers.
Be computationally fast.
Possess a strong theoretical foundation in computer science.

Currently, several technologies possess some number of these desired characteristics including Java with JavaSpaces, Python with Parsl, Python with Ray, various languages with Docker Swarm, and julia with Docker and Kubernetes. The five simulator jobs could be coded and run in any of these technologies. In what follows, these five technolgies are described and compared.

Java with JavaSpaces.

The JavaSpaces API can support the master-worker architecture wherein a master program runs on one node having a unique Internet Protocol address along with n_W workers who run on other, internet-accessible nodes and busy themselves by executing tasks that have been posted by the master on a JavaSpace bulletin board [48]. One coordination protocol for task posting and collection is the bag of tasks scheme wherein the master posts a batch of tasks and then waits until all of these tasks have been completed before posting another batch. This approach results in a program that is naturally balanced and naturally scalable [49]. Noble and Zlateva [50] find that “The simplicity and clean semantics of tuplespaces allow natural expressions of problems awkward or difficult to parallelize in other models [51].” A JavaSpaces program is also fault tolerant and decouples the semantics of distributed computing from those of the problem domain [49].

The runtime system Gigaspaces^™ that supports the JavaSpaces API exhibits low inter-node communication latency [52]. The primary operations on a Gigaspaces space are write, read, change, take, and aggregation [53, 54]. Appendix A of S1 Appendix contains shell scripts that start and run a JavaSpaces program on a cluster computer. Appendix B of S1 Appendix contains guidance for running a JavaSpaces program on a shared cluster computer.

Python with Parsl.

The Parsl package allows distributed Python programs to access thousands of nodes [55] either on cluster computers or in the cloud. The distributed application is created using the API operators Config, @python_app, and @bash_app.

Python with Ray.

The Python package, Ray [56] provides the API operators @ray.remote, ray.wait, ray.get, and ray.put. Ray contains it own runtime system to manage the starting, reading, deleting, and recovery of tasks [57].

Various languages with Docker and Docker Swarm.

Docker is a program that takes application language source code and creates a portable and executable version called a container. Docker Swarm Mode is a runtime system that orchestrates the execution of these containers across nodes on a cluster computer or in the cloud. Docker Swarm Mode can be used to manage a task-based, multi-language distributed program [58]. The steps needed to do this are 1) write the application modules in various application languages, 2) start support programs on each node, 3) start a Docker Swarm cluster by executing commands on each node, 4) create a Docker registry, 5) create images and from them, containers, 6) register the images, 7) create a stack file, and 8) run the application by deploying this stack.

julia with Docker and Kubernetes.

The julia language [59] contains an API that provides the @spawn, and fetch() operators needed to run a bag-of-tasks application [59]. To do this, one needs to first use Docker to containerize the julia-written executables. Then, these containers are run on a Kubernetes cluster [60].

Comparisons.

All five technologies are known to coordinate tasks, be internet-aware, and be computationally fast. Table 1 summarizes the strengths and weaknesses of these five technologies. Two notes are in order. JavaSpaces has a theoretical foundation in computer science [51, 52] that the other four technologies lack. Developing an MTC application with Docker Swarm Mode appears to require more user involvement with the runtime system than the other four technologies. On the other hand, container-based software development and distribution is quickly becoming the industry standard.

Download:

Table 1. Comparison of task-based technologies on desirable characteristics for building and running MTC applications.

https://doi.org/10.1371/journal.pone.0226861.t001

This author chose the JavaSpaces API to develop the MTC applications exercised in the next section rather than any of the other four technologies because it is the only technology known to possess all of the desirable characteristics listed in Table 1.

Optimization as an MTC application.

Optimization of stochastic functions under nonlinear constraints can be performed with the multiple dimensions ahead search (MDAS) algorithm of Haas [8, pp. 219-225]. This algorithm is a parallel version of the Hooke and Jeeves coordinate search algorithm [61]. MDAS executes by having the master assign each worker a vector of parameter values at which to compute the value of the objective function. These vectors are chosen such that the next M parameters are searched simultaneously for a maximum. Each worker computes the objective function value at its assigned set of parameter values. Once all of the workers have returned their function values, the master checks them for a new maximum (called an improvement). If found, the master stores this new best solution. This parallel search is repeated on these dimensions until no improvements are found. Then, the algorithm moves on to the next M dimensions.

This algorithm was benchmarked against the classic Bukin F4 function [62]: (17) for x ∈ [−15, −5], and y ∈ [−3, 3]. Starting at (−6, 2), MDAS found the global minimum of zero at the point (−10, 0) after 1081 function evaluations.