Prepaid parameter estimation without likelihoods

Merijn Mestdagh; Stijn Verdonck; Kristof Meers; Tim Loossens; Francis Tuerlinckx

doi:10.1371/journal.pcbi.1007181

Abstract

In various fields, statistical models of interest are analytically intractable and inference is usually performed using a simulation-based method. However elegant these methods are, they are often painstakingly slow and convergence is difficult to assess. As a result, statistical inference is greatly hampered by computational constraints. However, for a given statistical model, different users, even with different data, are likely to perform similar computations. Computations done by one user are potentially useful for other users with different data sets. We propose a pooling of resources across researchers to capitalize on this. More specifically, we preemptively chart out the entire space of possible model outcomes in a prepaid database. Using advanced interpolation techniques, any individual estimation problem can now be solved on the spot. The prepaid method can easily accommodate different priors as well as constraints on the parameters. We created prepaid databases for three challenging models and demonstrate how they can be distributed through an online parameter estimation service. Our method outperforms state-of-the-art estimation techniques in both speed (with a 23,000 to 100,000-fold speed up) and accuracy, and is able to handle previously quasi inestimable models.

Author summary

Interesting nonlinear models are often analytically intractable. As a result, statistical inference has to rely on massive, time-intensive, simulations. The main idea of our method is to avoid the redundancy of similar computations that typically occur when different researchers independently fit the same model to their particular dataset. Instead, we propose to pool computational resources across the researchers interested in any given model. The prepaid method starts with an extensive simulation of datasets across the parameter space. The simulated data are compressed into summary statistics, and the relation to the parameters is learned using machine learning techniques. This results in a parameter estimation machine that produces accurate estimates very quickly (a 23,000 to 100,000-fold speed up compared to traditional methods).

Citation: Mestdagh M, Verdonck S, Meers K, Loossens T, Tuerlinckx F (2019) Prepaid parameter estimation without likelihoods. PLoS Comput Biol 15(9): e1007181. https://doi.org/10.1371/journal.pcbi.1007181

Editor: Patrick Simen, Oberlin College, UNITED STATES

Received: February 7, 2019; Accepted: June 14, 2019; Published: September 9, 2019

Copyright: © 2019 Mestdagh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data we use is mainly simulated data which can be reproduced by following the methods described in the paper. The real date set we use in the paper can be found at http://www.prepaidestimation.org/. To see the data click on Example data and choose ‘Chillo partellus’.

Funding: This research was supported by the Research Fund of KU Leuven (GOA/15/003; OT/11/031) and the Interuniversity Attraction Poles program (IAP/P7/06). Merijn Mestdagh and Stijn Verdonck are supported by the Fund of Scientific Research Flanders. The computational resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation - Flanders (FWO) and the Flemish Government - department EWI. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This is a PLOS Computational Biology Methods paper.

Introduction

Models without an analytical likelihood are increasingly used in various disciplines, such as genetics [1], ecology [2, 3], economics [4, 5] and neuroscience [6]. For such models, parameter estimation is a major challenge for which a variety of solutions have been proposed [2, 1, 7]. All these methods have in common that they rely on extensive Monte Carlo simulations and that their convergence can be painstakingly slow. As a result, the current methods can be very time consuming.

To date, the practice is to analyse each data set separately. However, considering all the calculations that have ever been performed during parameter estimation of a particular type of model, for each different data set, one cannot help but notice an incredible waste of resources. Indeed, simulations performed while estimating one data set may also be relevant for the estimation of another. Currently, each researcher estimating the same model with different data will start from scratch, and can not benefit from all the possibly relevant calculations that have already been performed in earlier analyses by other researchers, in other locations, on different hardware, and for other data sets, but concerning the same model.

Hence, we propose an estimation scheme that dramatically increases overall efficiency by avoiding this immense redundancy. Most current algorithms are inherently iterative and (slowly) adjust their window of interest to the area of convergence. Instead, we propose to generate an all-inclusive and one-shot prepaid database that is capable of estimating the parameters of a particular model for all potential data sets and with almost no additional computation time per data set. Our approach starts with the extensive simulation of data sets across the entire parameter space. These data are then compressed into summary statistics, after which the relation between the summary statistics and the parameters can be learned using interpolation techniques. Finally, global optimization methods can be used on the previously created (hence, prepaid) database for accurate and fast parameter estimation on any device. This results in a mass lookup and interpolation scheme that can produce estimates to any given dataset very quickly.

In Fig 1 we present a graphical illustration of the prepaid parameter estimation method. First (panel A), for a sufficient number of parameter vectors θ, large data sets are simulated, compressed into summary statistics (i.e., s^sim) and saved—creating the prepaid grid. This prepaid grid is computed beforehand and the results are stored at a central location. Second (panel B1), the observed (data) summary statistics (s^obs) are compared to the simulated (data) summary statistics (i.e., s^sim) using an appropriate objective loss function d(s^sim, s^obs) and a number of nearest neighbor simulated summary statistics are selected. The loss function is related to the loss function used in the generalized method of moments [8] and method of simulated moments [9].

Download:

Fig 1. Graphical illustration of the prepaid parameter estimation method.

https://doi.org/10.1371/journal.pcbi.1007181.g001

Third (panel B2), interpolation methods are used to find the relation s = f(θ) between the parameter values and the summary statistics for the selected points of the previous step [10, 11]. In this paper, we use tuned least squares support vector machines, LS-SVM [12]. Finally (panel B3), the objective loss function d(s^pred, s^obs), now using predicted summary statistics s^pred, is minimized as a function of the unknown parameter values using an optimizer.

A number of important aspects of the prepaid method deserve special mention. First, the parameter space is required to be bounded. If this is unnatural for a given parametrization, then the parameters have to be appropriately transformed to a bounded space. Second, we typically start from a uniform distribution of parameter vectors in the final parameter space. This choice reflects on the uniformity of the grid’s resolution, but has no further implications provided the grid is sufficiently dense. Bayesian priors can be implemented without recreating the prepaid grid, since the prior can be taken into account in the loss function. Third, often a user is not interested in a single instance of a model, but rather has data from several experimental conditions that share some common parameters but assume other ones to be different. Also in these cases the prepaid grid does not need to be recreated, as the parameter constraints can be included through priors with tuning parameters (i.e., penalties). Fourth, the creation of the prepaid database is a fixed cost and usually takes from a couple of hours to one or more days, depending on the complexity of the model of interest (see below for a number of examples). Once its prepaid database is created, the parameters of the model can be estimated for any data set, with any amount of data (number of observations).

The prepaid method can be studied theoretically in simple situations. For example, in Methods, we apply the prepaid idea for estimating the mean of a normal distribution and study some of its properties for two different summary statistics. In what follows, the prepaid method will be applied to three more complicated, realistic scenarios.

Results

Example 1: The Ricker model

In a first example, we apply our prepaid method to the Ricker model [13, 2] which describes the dynamics of the number of individuals y_t in a species over time (with t = 1 to T_obs): (1) where . The variables N_t (i.e., the expected number of individuals at time t) and e_t are hidden states. Given an observed time series , we want to estimate the parameters θ = {r, σ, ϕ}, where r is the growth rate, σ the process noise and ϕ a scaling parameter. The Ricker model can demonstrate near-chaotic or chaotic behavior and no explicit likelihood formula is available.

Wood [2] used the synthetic likelihood to estimate the model’s parameters. In the original synthetic likelihood approach (denoted as SL^Orig), the assumed multivariate normal distribution of the summary statistics is used to create a synthetic likelihood. The mean and covariance matrix of this normal distribution are functions of the unknown parameters and are calculated using a large number of model simulations. The synthetic likelihood is proportional to the posterior distribution from which is sampled using MCMC and a posterior mean is computed.

Wood’s synthetic likelihood SL^Orig approach is compared to the prepaid method, where we create a prepaid grid of the mean and the covariance matrix of a similar set of summary statistics. Prepaid estimation comes in multiple variants, depending on the use of an interpolation method. The first, which uses only the prepaid grid points and chooses the nearest neighbor (maximum synthetic likelihood) as final estimate, will be called . The second, , uses LS-SVM to interpolate between the parameters in the prepaid grid to increase accuracy. The differential evolution algorithm (a global optimizer; [14]) is used to maximize this interpolated synthetic (log)likelihood. Additional details on the implementation of the synthetic likelihood can also be found in Methods.

Fig 2 shows both the accuracy of parameter recovery (as measured with the RMSE) and computation time for the three methods under comparison: (1) SL^Orig as in [2], the prepaid method (2) with interpolation (), and (3) without () interpolation. As can be seen in Fig 2, the prepaid estimation techniques lead to better results than the synthetic likelihood for T_obs = 1, 000, both in accuracy and speed. The SL^Orig method leads to some clear outliers (see Methods) which testifies to possible convergence problems (probably due to local minima). The prepaid method suffers much less from this problem. Most striking is the speed up of the prepaid method: The version of the prepaid estimation is finished before a single iteration of the 30,000 iterations in the synthetic likelihood method has been completed—100,000 times faster. In addition, it is demonstrated that the coverages of the prepaid method confidence intervals are very close or exactly equal to the nominal value (we look at 95% bootstrap-based confidence intervals). SVM interpolation is mainly helpful for large T_obs, where one expects a higher accuracy of the estimates and the grid is too coarse. The analyses with large T_obs could only be completed in a reasonable time using the prepaid method (See Methods for more detailed information).

Download:

Fig 2. The RMSE versus the time needed for the estimation of the three parameters of the Ricker model (see Eq 1).

The RMSE and time are based on 100 test data sets with T_obs = 1000. The three colors represent the three parameters (blue for r, red for σ and yellow for ϕ). Solid lines represent the SL^Orig approach, dashed lines the approach (using only nearest neighbors), and dotted lines the approach (using interpolation). The stars and the dots represent the time needed for the and the estimation, respectively. The estimates for SL^Orig are posterior means, based on the second half of the finished MCMC iterations. The time of the prepaid method shown in this picture does not include the creation of the prepaid grid, but only the time needed for any researcher to estimate the parameters once a prepaid grid is available.

https://doi.org/10.1371/journal.pcbi.1007181.g002

In the application above, the tacitly assumed prior on the parameter space is uniform. In addition, there is only one data set for which a single triplet of parameters (r, σ, ϕ) needs to be estimated. In Methods, we show how both limitations can be relaxed. First, it is explained how different priors for the Ricker model can be implemented. Second, it is discussed what can be done if there are two data sets (i.e., conditions) for which it holds that r₁ = r₂ and σ₁ = σ₂ but ϕ₁ and ϕ₂ are not related.

Finally, we also tested our estimation process on the population dynamics of the Chilo partellus, extracted from Fig 1 in Taneja and Leuschner [15, 16]. Here we found that r = 1.10 (95% confidence interval 1.06–1.34), σ = 0.43 (95% confidence interval 0.30–0.54) and ϕ = 140.60 (95% confidence interval = 43.94–208.19). We found similar results using the synthetic likelihood method (see Methods), but our estimation was 4000 times faster.

Example 2: A stochastic model of community dynamics

A second example we use to illustrate the prepaid inference method is a trait model of community dynamics [17] used to model the dispersion of species. For this model (see also Methods section), there are four parameters to be estimated: I, A, h, and σ. As with the first application, there is no analytical expression for the likelihood [17].

As an established benchmark procedure for this trait model, we apply the widely used Approximate Bayesian Computation (ABC) method [18, 19, 20, 21] as implemented in the Easy ABC package and denoted here as (PM stands for posterior means, which will be used as point estimates) [22]. As priors, we use uniform distributions on bounded intervals for log(I), log(A), h and log(σ) (see Methods for the exact specifications), but this can be easily changed as explained for the first example.

To allow for a direct comparison with the ABC method (), and to illustrate the versatility of the prepaid method, we have also implemented three Bayesian versions of the prepaid method. The first, , creates a posterior proportional to the prepaid synthetic likelihood. The second method, , saves not only the mean and covariance matrix of the summary statistics for every parameter in the prepaid grid, but also a large set of uncompressed summary statistics. Using these statistics we are able to approximate an ABC approach. The third, , again interpolates between the grid points to achieve a higher accuracy.

All methods result in accuracies of the same order of magnitude as can be seen in Table 1. The main difference is again the speed of the methods: is about 23,000 times faster than traditional ABC. For small sample sizes, all ABC based methods achieve good coverage. However, for large sample sizes, cannot be used anymore (because of the unduly long computation time). For the prepaid versions and large samples, it is necessary to use SVM interpolation between the grid points to get accurate results.

Download:

Table 1. The RMSE of the estimates of the test set of the trait model.

T_obs refers to the number of observations (i.e., vector with species frequencies) and Ω is the number of prepaid points.

https://doi.org/10.1371/journal.pcbi.1007181.t001

Example 3: The Leaky Competing Accumulator for choice response times

In a third example, we apply our method to stochastic accumulation models for elementary decision making. In this paradigm, a person has to choose, as quickly and accurately as possible, the correct response given a stimulus (e.g., is a collection of points moving to the left or to the right). Task difficulty is manipulated by applying different levels of stimulus ambiguity.

A popular neurally inspired model of decision making is the Leaky Competing Accumulator (LCA [23]). For two response options, two noisy evidence accumulators (stochastic differential equations, see Methods section) race each other until one of them reaches the required amount of evidence for the corresponding option to be chosen. The time that is required to reach that option’s threshold is interpreted as the associated choice response time. For different levels of stimulus difficulty, the model produces different levels of accuracy and choice response time distributions. The evidence accumulation process leading up to these choices and response times is assumed to be indicative of the activation levels of neural populations involved in the decision making.

As in the first two examples, there is no analytical likelihood available that can be used to estimate the parameters of the LCA. Moreover, the LCA is an extremely difficult model to estimate. To the best of our knowledge, only [24] systematically investigated the recovery of the LCA parameters, but for a slightly different model (with three choice options) and with a method that is impractically slow for very large sample sizes, making it difficult to show near-asymptotic recovery properties with.

For an experiment with four stimulus difficulty levels, the LCA model has nine parameters. However, after a reparametrization of the model (but without a reduction in complexity), it is possible to reduce the prepaid space to four dimensions (see Methods) and conditionally estimate the remaining subset of the parameters with a less computationally intensive method. Three variants of the prepaid method have been implemented: taking the nearest neighboring parameter set (based on a symmetrized χ² distance between distributions) on the prepaid grid (), averaging over the grids nearest neighboring parameter sets of 100 non-parametric bootstrap samples (), using SVM interpolation for every bootstrap estimate (). A nearest neighbor or bootstrap averaged estimate completes in about a second on a Dell Precision T3600 (4 cores at 3.60GHz), an SVM interpolated estimate requires a couple of minutes extra.

Fig 3 displays the mean absolute error (MAE) of the estimates for four of the nine parameters as a function of sample size, separately for three estimation methods. The results for the other parameters are similar and can be consulted in the Methods section. It can be seen that with increasing sample size, MAE decreases. The SVM method pays off especially for larger samples. Fig 4 shows detailed recovery scatter plots for a subset of the parameters for 1,200 observed trials, which is the typical size of decision experiments. To get better recovery, larger sample sizes have to be considered (see Methods section). In general, recovery is much better than what has been reported in [24]. The coverage of the method, based on non-parametric bootstrapping, is satisfactory for all sample sizes, provided SVM interpolated estimates are used for T_obs > 100000. In addition, we do not find evidence for a fundamental identification issue with the two option LCA, as has been stated in [24].

Download:

Fig 3.

The mean absolute error of the estimates of four central parameters of the LCA (common input v, leakage γ, mutual inhibition κ, evidence threshold a) as a function of sample size (abscissa) and for three different methods: (1) choosing the nearest neighbor grid point in the space of summary statistics (, triangles); (2) using the average of a set of nearest neighbor grid points based on bootstrap samples (, open circles) and (3) using SVM interpolation between the 100 nearest neighbors (, crosses).

https://doi.org/10.1371/journal.pcbi.1007181.g003

Download:

Fig 4. Parameter recovery for the LCA model with 1200 observations (300 in each of the four difficulty conditions); the true value on the abscissa and estimated value on the ordinate.

The same parameters as in Fig 3 are shown. The method used to produce these estimates is the averaged bootstrap approach (, see Methods for details).

https://doi.org/10.1371/journal.pcbi.1007181.g004

Discussion

In three examples, we have demonstrated the efficacy and versatility of the prepaid method. The prepaid method is at least as accurate as current methods, but many times faster (23,000 to 100,000-fold speed up). Besides the improvements at the level of speed and accuracy, the prepaid method has a number of other distinct advantages. First, the prepaid method can be used for a very large number of observations, contrary to the synthetic likelihood or ABC methods. The use of very large simulated data sets allows a practical investigation of large-sample properties of the estimator, which is a problem for the synthetic likelihood and ABC. Second, because of the enormous speed improvement and having data sets available across the whole parameter space, the prepaid method allows for fast yet extensive testing of recovery of simulated data across this space—the recovery of every single parameter set can be evaluated. Such a practice leads to detailed internal quality control of the used estimation algorithm.

Although the idea behind the prepaid method is fairly simple, we want to anticipate a few misconceptions that might arise. First, as has been demonstrated in the context of the Ricker model (the first example), the prepaid method can easily deal with different priors and with equality constraints on parameters, without the need to recreate the underlying prepaid grid. Second, the observed data based on which the model parameters have to be estimated can be of any size, again without the need to recreate the prepaid grid for each and every sample size.

In the first two examples the synthetic likelihood [2] is used, but its exact effect on likelihood based model selection techniques, such as information criteria, is not known. For users interested in model selection, we propose cross-validation as its implementation is straight forward. The main draw-back of this resampling method, its computational burden, is mitigated by the use of the prepaid method.

Ideally, the prepaid databases and the corresponding estimation algorithms will be constructed and made available by a team of experts for the model at hand. Subsequently, a cloud based service can be set up to offer high quality model estimations to a broad public of researchers. As an example, we created such a service for the Ricker model in Eq 1: http://www.prepaidestimation.org/, where we allow the user to estimate the parameters of the Ricker model for personal data as well as 4 example data sets including one real life data set [15, 16]. By using such a cloud based service, researchers that need their data analyzed with computationally challenging models, can avoid many of the pitfalls they would otherwise encounter venturing out on their own. This practice will also lead to increased reproducibility of computational results.

As the need for reproducibility and transparency is (fortunately) increasingly recognized by the broader scientific community, critical model users will want to see proof of robust estimation across the entire parameter space, and be able to test this themselves. The current standard of simply sharing the code of a procedure, still grants developers of complex models/methods a layer of protection from public scrutiny, because the level of knowledge and infrastructure required to check the work is considerable and not many are called to take up the challenge. The prepaid method, however, allows any user with a basic grasp of statistics to check the consistency of the model and method, using data they have simulated themselves. In the future, we expect a natural evolution towards a situation where stakeholders in certain models (the developers and/or heavy users) will provide an estimation service or outsource this endeavor to a third party. The infrastructure required for hosting such a service is orders of magnitude lighter than what is required for the calculation of the database itself or a thorough simulation study for that matter. We are currently hosting the Ricker model on a very modest system (medium level desktop).

A first possible objection to the prepaid method is the considerable initial simulation cost (for the examples discussed, prepaid simulations took up to a couple of days on a 20-core processor). However, this overhead cost will dissipate entirely as increasingly more estimates are sourced from the same prepaid database. Moreover, the initial prepaid cost can be easily distributed across multiple interested parties. Further, because the database can be used for internal quality control, additional simulation studies investigating the recovery of parameters are made redundant. Indeed, whenever a new model and associated parameter estimation method are proposed, a recovery study is needed to study how well the parameters of the model can be estimated using the method. When such a simulation study is set up in a rigorous way, the prepaid grid will have been (partially or completely) constructed. For the first and the second example, the time to create the prepaid grid was of the same order as that of the parameter recovery study included for the estimation techniques the prepaid grid was compared with. Note however that the parameter recovery study of the traditional techniques was only partial, as data sets with more observations, for which the parameter estimation would take an excessively long time using only traditional methods, were excluded. If those would be included, a parameter recovery study would be at least 10 times slower than the creation of the prepaid grid. The fact that a parameter recovery study takes at least as much time as the creation of the prepaid grid makes sense. A recovery study should test the estimation of parameters in the whole realm of possible data sets. The prepaid grid exactly covers this realm.

The argumentation above shows that a parameter recovery study and a prepaid grid are very related. In fact, Jabot, saw the necessity of reusing ABC simulations to reduce computation time in his recovery study for the model of the second example [17]. More broadly, we are convinced that other researchers also have used similar tricks to avoid redundant simulation within their own research context. For example, a reviewer of this manuscript noted that s/he uses a prepaid grid (although not named so) when trying models in which the parameters change across trials. The main difference with prepaid estimation is that we propose to reuse these simulations to facilitate future estimations.

A second possible objection is that the prepaid grid, unsurprisingly, does not escape the curse of dimensionality: The grid size grows exponentially with the number of parameters. The prepaid method is most effective for highly nonlinear models with substantively meaningful parameters, as they appear in various computational modeling fields. For these models, all simulation based estimation techniques struggle with the curse of dimensionality. For the prepaid method, this limitation can be alleviated in a number of ways. First, the use of interpolation techniques allows for a substantial reduction of the number of prepaid points (by a factor of five for the same accuracy in the trait model example; see Methods section). Second, as is shown in the LCA example, it is possible to only partially apply the prepaid method and combine it with traditional estimation techniques. In this way, the less challenging parameters can be estimated conditionally on a prepaid grid of the more intricately connected ones. Third, as shown by tackling three challenging examples, current storage and/or memory technology can accommodate realistically sized prepaid databases.

A last possible objection is the risk, that once the prepaid grid is created for a certain model, researchers will be biased towards using this particular model. They may prefer the relatively easy prepaid estimation of this model over the use of other models without a prepaid grid. We hope however that also the creation of the prepaid grid is manageable enough for any model to prevent such scenarios.

A possible improvement of the prepaid method lies in a smarter construction of the prepaid grid. First, there is a straightforward theoretical angle: spreading the grid points out according to Jeffrey’s prior rather than a naïve parameter based prior, would lead to a more evenly distributed estimation accuracy, and therefore a smaller database size will suffice for a given minimum accuracy. Additionally, the database could be improved based on the actual queries of users. If the simulation grid proves a bit thin around the requested area (not a lot of unique grid points), more grid points can be added there. This way more detail is added where it matters.

Finally, the prepaid method also offers exciting opportunities for future research. First, another typical case where the same model has to be estimated multiple times, arises in a multilevel context (where several individual analyses are regularized by a set of hyperparameters defined on the group). Although extremely useful, multilevel analyses typically come with an additional computational burden. Because the synthetic likelihood, as any likelihood, can be extended to a multilevel context, the prepaid method should be too. Further research is needed to develop this idea.

Second, the prepaid philosophy can also be used to choose a good set of summary statistics, which are necessary for simulation based estimation techniques. During the creation of the prepaid grid many summary statistics can be saved, with no additional simulation cost. The effectiveness of combinations of summary statistics are then easily tested in parameter recovery studies as the prepaid estimation is so quick.

It is our strong belief that this method will massively democratize the use of many computationally expensive models, which are now reserved for people with access to specific high-end hardware (e.g., GPUs, HPC). Apart from such democratization, this approach could significantly impact the current work flow of scientific modeling, in which every part of the estimation is carried out locally by an individual researcher.

Methods

A toy example: Estimating the mean of a normal

For a very simple setting, we want to study the performance of the prepaid methods analytically.

Assume y_i ∼ N(μ, s²) (i = 1, …, T_obs) with the mean μ unknown (and to be estimated and the standard deviation s known (so number of parameters K = 1). The observed mean is denoted as . We will explore two situations. In the first situation, will be our summary statistic s^obs (hence number of summary statistics R = 1) to estimate μ ( is also a sufficient statistic for μ). In the second situation, we will study what happens if is chosen to be the summary statistic.

Situation 1: .

As a prepaid grid, we take N_r evenly spaced μ-values with spacing or gap size Δ = μ_j+1 − μ_j (see Fig 1, left figure of panel A; in our case the parameter space is one dimensional). For each value μ_j, T_sim values of y are simulated and the sample average is computed (i.e., ) (see middle figure of panel A in Fig 1). Typically, T_sim = 1000 or larger. Hence, every value of μ_j is paired with a particular : .

Given an observed , the N nearest neighbors of simulated statistics are selected: , (see panel B1 of Fig 1), such that . Typically, N = 100. In principle, the selected μs depend on , but for simplicity we suppress this dependence in the notation.

Because of the linearity of the problem, we can safely assume that if T_sim is large enough, the N selected μ values are all consecutive or nearly consecutive (because of noise in the prepaid simulation of , it can happen that the N selected μ values are not consecutive). We denote the average of these N μ-values as M_μ. Ordering all values from smallest to largest (denoting the jth value as μ_[j] and assuming they are exactly consecutive, M_μ can be expressed as): where we have defined μ_[1] as

In addition (assuming that all values are exactly consecutive), their variance V_μ is given by

Hence, their standard deviation is and thus independent of .

Using the N nearest neighbour pairs, we assume as a linear interpolator (see panel B2 of Fig 1) in this example a linear regression model that links the simulated statistics to the true underlying μ: , with . Obviously, β₀ = 0 and β₁ = 1.

Given , N selected prepaid points and the fitted linear regression model, we know from linear regression theory that: where 0 and 1 are the true β₀ and β₁ and

The distribution is assumed to hold for repeated simulations of the replicated statistics in the prepaid grid.

Because we work with linear regression, the optimization problem is simple. In this case, the optimal value of μ for a given can be found by inverting the regression line:

In this simple example, the method of predicted moments from Panel B3 in Fig 1 yields an exact solution for the estimated mean, given the observed sample average.

Next, we can study the properties of . We begin by calculating the conditional mean and conditional variance . Hence, we treat the observed data (or sample average) as given and fixed. These expectations are taken over different simulations of ’s in the prepaid grid. Before giving the expressions, it is useful to note that

Now, using the approximations given in [25] for ratios of random variables, we find that: and

Invoking the double expectation theorem to arrive at the unconditional expectations, we have: (2) where , that is, the difference between the expected value of the mean of the selected nearest neighbors μ’s and the true μ. Likewise, we can derive the marginal variance . We will assume that the variance in is equal to . In addition, we assume that and correlate perfectly, such that . For this particular example, these assumptions make sense. Then we can derive that: (3)

From Eq 2, we learn that if there is no systematic deviation in the selection of μ-grid points, the prepaid estimator is unbiased. In the other case, the bias decreases with T_sim but is proportional to s². In Eq 3, the leading term of the variance is , which is the same as in classical estimation theory. For the other terms, they all have T_sim (or a power of it) in the denominator. Because T_sim is usually quite large, these terms tend to be in general of lesser importance. However, some terms also have both N (the number of selected nearest neighbor grid points) and Δ (the gap size) in the denominator. It is worthwhile to note that increasing the resolution (i.e., decreasing Δ), while keeping N constant, will increase the additional terms and thus add to the error. The reason for this is that the interpolation is defined on a too small grid, leading to uncertainty in the estimated regression. This effect is illustrated in the left panel of Fig 5 in which the root mean square error (RMSE) is shown for the estimation of μ for different values of N and Δ. The plot is constructed by means of a simulation study, but confirms our analytical results.

Download:

Fig 5. RMSE (based on a simulation study) of the toy example estimation as function of the gap size (Δ) and number of nearest neighbors selected to carry out the interpolation (N).

The left panel is called situation 1 in which and the right panel is situation 2 (). For the second situation, the trade-off between Δ and N is clearly visible.

https://doi.org/10.1371/journal.pcbi.1007181.g005

Situation 2: .

In the second situation, we will again estimate μ (the unknown mean of a unit variance normal), but in this case is used as a statistic. Thus, the relation between the simulated statistics and μ is quadratic (and thus nonlinear). Again we use a local linear approximation. Clearly, this approximation will only be approximately valid if we do not choose the area of approximation too large. However, unlike in the first situation, we do expect an additional effect of the approximation error.

No analytical derivations were made for this case, but we conducted a similar simulation study as in situation 1. The results (in terms of RMSE) are shown in the right panel of Fig 5. As can be seen, there is a clear optimality trade-off visible between Δ and N. This can be explained as follows: Fix N and then consider the gap size Δ. If Δ is too small, we get a similar phenomenon as in the left panel, that is a large RMSE. However, if we take Δ too large, then the approximation error will dominate (because the linear interpolation misfits the quadratic relation). The optimal point will be different for different N.

This toy example demonstrates the sound theoretical foundations of the prepaid method in well-behaved situations. However, the question is how well the method performs for real life examples.

Application 1: The Ricker model

The basic model equations of the Ricker model is given in Eq 1.

Synthetic likelihood estimation.

For the synthetic likelihood estimation (SL^Orig), we made use of the synlik package [26]. The synthetic likelihood l_s for a data set with summary statistics s^obs and a certain parameter vector θ = (r, σ, ϕ) is given by (4) where and are the estimated mean and covariance of the summary statistics when Eq 1 is simulated multiple times with parameter θ.

The statistics used by the synthetic likelihood function were the average population size, the number of zeros, the autocovariances up to lag 5, the coefficients of the quadratic linear autoregression of and the coefficients of the cubic regression of the ordered differences y_t − y_t−1 on the observed values.

For each data set we used the synthetic likelihood Markov chain Monte Carlo (MCMC) method with 30000 iterations, a burn in of 3 time steps and 500 simulations to compute each and [26]. We used the following prior: (5)

The synlik package generates the MCMC chain on a logarithmic scale, we estimated the parameters as the exponential of the posterior mean. To ensure convergence, only the last half of the chain is used (the last 15000 iterations).

Creation of the prepaid grid.

For the prepaid estimation, we used the same summary statistics as for the traditional synthetic likelihood, except for two differences. First, the coefficients of the cubic regression of the ordered differences y_t − y_t−1 on the observed values could not be used, because the observed values are not available when creating the prepaid grid. Second, we changed the number of zeros to the percentage of zeros to make this statistic independent of T_obs (as this may change depending on the observation).

We filled the prepaid grid with 100000 parameter sets using the priors of Eq 5. To cover this grid as evenly as possible (and avoiding too large gaps), the uniform distribution was approximated using Halton sequences [27, 28]. For each parameter set in the prepaid grid, we simulated a time series of length 10⁷ and used the summary statistics of this long time series as .

Each time series was then split into series of length T_prepaid = 100, 1000 and 10000 which were used to compute the covariance for the statistics computed on data of these lengths. This means, for example, that we had 100000 series of length 100 to compute the covariance matrix for a certain parameter set for time series of length 100. If we need to estimate parameters of a time series with T_obs not equal to one of the T_prepaid lengths, we use the covariance matrix created with time series of length T_prepaid which is closest to T_obs in logarithmic scale and adapt the covariance matrix into (6)

The creation of the prepaid grid took approximately one day on a 3.4GHz 20-core processor.

To allow the estimation for a larger range of parameters for the online estimation at http://www.prepaidestimation.org/ we created a new and bigger prepaid grid using the following priors: (7)

We filled to prepaid grid with 100000 parameter sets and used this prior for the real life data set on the Chilo partellus.

Prepaid estimation.

Four variants of prepaid estimation were implemented for this example. All use the negative synthetic likelihood as distance (d(s^sim, s^obs) as defined in the main text and Fig 1). First, we do a nearest neighbor estimation , without using any interpolation between the grid points of the prepaid data set. We compute the synthetic likelihood of all the prepaid parameters for the summary statistics of the test data set. The parameter vector with the highest likelihood, the so-called nearest neighbor may already be a good estimation. For a low number of time points T_obs, it is to be expected that the error on the parameter estimate is much larger than the gaps in the prepaid grid, and in such a case, the estimation approach suffices.

Second, a more accurate estimation can be acquired by interpolating between the parameter values in the prepaid grid (). Therefore, we learn the relation between the parameters and the summary statistics: . However, we only learn this relation in the region of interest, that is the 100 nearest neighbors according to the synthetic likelihood. For each summary statistic, we create, on the fly, a separate least squares support vector machine (LS-SVM) [12] using the 100 nearest neighbors. This machine learning technique is chosen as it is a fast non-linear method which generalizes well. We limit the predictions to the possible range of the summary statistics (e.g., to prevent a percentage of zeros, one of the statistics, larger than 1).

We then use the differential evolution global optimizer [14] to find the maximum of: (8) where is the covariance matrix of the statistics of the nearest neighbor as defined in Eq 6. The superscript “PP” is used to denote that we use the prepaid version of synthetic likelihood, and not the traditional version as used by [2] (see Eq 4). The optimization process is constrained and we use the minima and maxima for each parameter of the 100 nearest neighbors as effective bounds.

The approach makes use of a non-linear black box interpolator. However, we may also consider using a much faster linear regression (see also the toy example in Section). Therefore, we will also compare the (and ) approach to a third option where we predict the summary statistics using a linear regression (called the approach).

Third, we can easily implement a prior for the likelihood in Eq 4. This leads to a posterior given by (9)

The parameters will be estimated as the maximum a posteriori (MAP), as comparison to maximum likelihood estimation which is a maximum a posteriori with a uniform prior. Here we will apply this extension to the nearest neighbor estimation: .

Lastly we will show that our prepaid method can also be used to cover an experimental set-up. In such a set-up, we want to estimate the same model over several experimental conditions. For example, we may be interested in the effect of light intensity on the population dynamics of a certain type of bacteria. In such an example we would vary the light intensity over several conditions and estimate the population dynamics again for each condition.

If, for this experimental set-up, the conditions c are independent, the likelihood of the whole experiment is (10) where l_s,c(θ_c) is the synthetic likelihood for condition c. This is equivalent to estimating each parameter set θ_c individually for each condition c poses no problem for the previously proposed prepaid method.

In many experimental set-ups, the conditions will however not be independent. In the case of our example, we may only be interested in the effect of light intensity on the scaling parameter ϕ, and expect the other parameters r and σ to be constant across conditions. Such a dependence between conditions can be mimicked using priors. In case of the experiment example, with two conditions, we propose the following prior: (11) where is the standard normal distribution and and are the averages of respectively r and σ across conditions ( and ). Using such a prior we can force r₁ and σ₁ to be similar to r₂ and σ₂ respectively. The smaller the tuning parameter σ_prior, the more all constrained parameters (r and σ) will be forced to be equal. If σ_prior is too large the estimation will not take into account the interdependence between the conditions. So at first, it seems that σ_prior needs to be as small as possible. However, if σ_prior is too small we run into trouble with the sparsity of the prepaid grid. In the limit, where σ_prior goes to zero, the estimation process will choose a parameter where r₁ = r₂ and σ₁ = σ₂ will hold exactly. Due to the nature of the prepaid grid, this will lead to the undesired result where exactly one prepaid point is chosen for both conditions, meaning that also ϕ₁ = ϕ₂. Luckily, σ_prior can be easily tuned. Once the prepaid grid is created, we can estimate many test parameters using the the experimental set-up in combination with a certain tuning parameter. Subsequently, the tuning parameter which leads to the best estimates of these test parameters is chosen.

In practice, when σ_prior is tuned, we will first create a pool of eligible parameters for each condition individually using the nearest neighbor approach . In a second step we fill refine these pools by using the prior of Eq 11 and choose the best estimate for each condition. In a last step we replace r₁ and r₂ by and σ₁ and σ₂ by to ensure that the constraints of the experimental set up are exactly satisfied.

More generally, for an experiment with several conditions where we want parameter θ to be constant over the conditions we get the following prior: (12)

Test set.

As a test set we first used 100 random parameters created with the prior of Eq 5. To avoid problems with the borders we deleted parameters that where within 1% range of the bounds. We simulated data sets for T_obs = {10², 5⋅10², 10³, 10⁴, 10⁵}. For each data set we estimated parameters using the nearest neighbor () and the approach. For T_obs = 10⁵, we also estimated the parameters using the approach. Due to time constraints, we only estimated parameters for the data with T_obs ≤ 10³ using the traditional synthetic likelihood approach.

Next we also created test data sets from different priors for T_obs = 10². Prior P₁ from Eq 5 can also be written as (13) where Beta is a beta distribution with parameters α = 1 and β = 1. Similarly, we created a test set from prior P₂ (14) and prior P₃ (15)

We will test if performs best when the correct prior is used in the estimation process. Last we also created a test set for T_obs = 10² for an experimental set up with two conditions where r and σ are equal over the conditions.

In the subsequent sections, we will evaluate the methods on the following criteria: accuracy, speed and coverage.

Results accuracy.

To start off, we look at the recoveries for T_obs = 10³ for all 100 simulated data sets and the three methods (SL^Orig, and ). Scatter plots are shown in Fig 6. It can seen that the synthetic likelihood estimation leads to some clear outliers. One possible reason for the absence of outliers in the prepaid estimation is the fact that prepaid estimation from the start examines the whole grid and therefore has less problems with getting stuck in local optima.

Download:

Fig 6. Estimated versus true parameters of the Ricker model of 100 data sets with T_obs = 1000.

The SL^Orig estimation has some problems with outliers.

https://doi.org/10.1371/journal.pcbi.1007181.g006

More generally, we plotted the accuracy of each of the methods as a function of time series length T_obs in Fig 7. The left panel shows the root mean square error (RMSE), while the right panel shows the median absolute error (MAE). We decided to look at the MAE because the few outliers for SL^Orig (which were shown Fig 6) may inflate the RMSE of the synthetic likelihood disproportionally, which happens to a certain extent. However, very similar conclusions can be drawn for both performance measures. In general, accuracy increases when T_obs increases (i.e., both RMSE and MAE decreases). For RMSE, our SVM prepaid method clearly outperforms the traditional synthetic likelihood method SL^Orig for every T_obs and every parameter. For T_obs = {5⋅10², 10³}, also the prepaid approach leads for every parameter to a lower RMSE compared to the synthetic likelihood. For all T_obs, the prepaid leads to a higher accuracy compared to the prepaid and this difference becomes larger for a larger T_obs. For MAE, the prepaid method and the original synthetic likelihood SL^Orig show a very similar accuracy (for T_obs ≤ 10³). Both outperform the prepaid.

Download:

Fig 7. The accuracy of all estimation methods versus the number of time points T_obs.

The left panel shows the mean squared error, while the right panel shows the median absolute error. The three colors represent the three parameters. Blue lines refer to the parameter r, red lines to the parameter σ and yellow lines to the parameter ϕ. The solid line represents the original synthetic likelihood approach SL^Orig (stopping at T_obs = 10³), the dashed line the prepaid approach and the dotted line the prepaid approach.

https://doi.org/10.1371/journal.pcbi.1007181.g007

The largest attainable accuracy for the prepaid approach is limited by the spacing of the prepaid grid. If we had created an equally spaced grid of T_obs = 10⁵ points using the prior in Eq 5, we would have the following gaps in each of the three parameter dimensions: (16)

We do not have an equally spaced grid, but it is expected that the quasi Monte Carlo distribution of points creates expected gaps close to the ones in Eq 16. Therefore, it is no coincidence that the best possible RMSE using the prepaid approach has the same order of magnitude as the gap size Δ, as can be seen in Table 2 for the case of T_obs = 10⁵. However, Table 2 also show that the prepaid approach leads to a much lower RMSE. The difference between the and the prepaid approach for T_obs = 10⁵ is further visualized in Fig 8.

Download:

Fig 8. The estimation of the three parameters of the Ricker model of 100 data sets with T_obs = 10⁵.

The estimation clearly outperforms the estimation.

https://doi.org/10.1371/journal.pcbi.1007181.g008

Download:

Table 2. RMSE for the estimation of the parameters of the Ricker model for T = 10⁵ using the

,

and

prepaid methods.

https://doi.org/10.1371/journal.pcbi.1007181.t002

The results in Table 2 also show the need for a non-linear interpolator for the prepaid method. The RMSE of a linear regression interpolator () is much larger than that of the SVM prepaid.

In sum, we can conclude that the prepaid estimation methods lead to better, or at least similar, results as the traditional synthetic likelihood.

Results speed.

The largest improvement of the prepaid method over synthetic likelihood is in computational speed: The prepaid method is many times faster than synthetic likelihood. Consider Fig 2 in the main text where it is shown that the prepaid method is finished before a single iteration of the 30000 iterations are done by the SL^Orig method. While the and the prepaid methods are finished in respectively 0.044 and 3.7 seconds, independent of the time series length T_obs, the SL^Orig method grows slower with an order of magnitude of T_obs. In each SL^Orig iteration one needs to simulate multiple time series with length T_obs. The larger T_obs, the slower the estimation. While the synthetic likelihood needs approximately one and a half hour to estimate the parameters for a time series with length T_obs = 10³. The prepaid estimation still finishes in 0.044 s, which is more than 10⁵ times faster. The speed up factors are presented in Table 3 and as can be seen from Fig 7, there is not loss of accuracy. The speed up would reach millions, if we had the time to run the synthetic likelihood method for longer time series.

Download:

Table 3. Average time in seconds needed for the SL^Orig estimation for multiple T_obs and the speed up for the

and

methods.

The time for T_obs = 10⁴ and T_obs = 10⁵ was not measured, so these values are estimated and between brackets. (Fig 7 shows the corresponding accuracies).

https://doi.org/10.1371/journal.pcbi.1007181.t003

Results coverage.

Next, we look at the coverage rates of the 95% confidence intervals as obtained with the bootstrap in combination with the prepaid method. To estimate a 95% confidence interval of the estimates for the prepaid method, a parametric bootstrap with B = 1000 bootstrap samples was used.

For the prepaid version the estimate for the observed data set was obtained using the approach and the bootstrap estimates were commonly obtained using the prepaid method applied to the bootstrap data sets. However, if in the first 100 bootstraps only half of the nearest neighbors where unique points, the bootstrap distribution could be considered questionable. This behavior is to be expected for larger sample sizes T_obs, because the true bootstrap distribution is very peaked so that every bootstrap sample will have the same nearest neighbor grid point. When this occurs, we would estimate the parameters of each bootstrap using differential evolution, using the SVM created by the original 100 nearest neighbors.

Alternatively, for the synthetic likelihood approach (using MCMC) we computed the 95% confidence interval by calculating the 0.025 and 0.975 quantiles of the last half of the posterior samples.

The coverage results for the test set of 100 parameters are shown for three different values of T_obs in Table 4. It can be seen that for both methods, the coverage is close to the nominal level of 95%, but the coverage of the prepaid method is slightly better.

Download:

Table 4. The effective coverages of the test set for different T_obs.

https://doi.org/10.1371/journal.pcbi.1007181.t004

Results prior.

In this paragraph we show how we can benefit from using the correct prior. We estimate the parameters of the three testsets for T_obs = 100, created with uniform prior P₁ from Eq 9 and beta distribution priors P₂ and P₃ from Eqs 14 and 15. We estimated all three data sets using maximum a posteriori estimation using all three priors. The results are shown in Table 5. Using the correct prior leads, as expected, to the best results.

Download:

Table 5. RMSE of

estimation of test sets with T_obs = 100 created with priors P₁, P₂ and P₃ and estimated by using priors P₁, P₂ and P₃.

For each test set and parameter the best result is shown in bold.

https://doi.org/10.1371/journal.pcbi.1007181.t005

Parameter constraints across conditions.

We estimated the parameters for a two condition experimental set up with equal r and σ, with and without the prior from Eq 11 (parameter σ_prior was tuned on 100 similar simulated data sets). The results are shown in Table 6. Using the prior from Eq 11, which implements the parameter constraints of the experimental set up, leads, as expected, to better results for each parameter. Even for ϕ, which is absent in the prior, we find better results.

Download:

Table 6. RMSE for Ricker model data where T_obs = 100 for an experimental set up with two conditions where r and σ are equal over the conditions.

Parameters are estimated by using with a flat prior (same as )and with a prior from Eq 11.

https://doi.org/10.1371/journal.pcbi.1007181.t006

Results real life data set.

The results for the estimation of the population dynamics of the Chilo partellus [16, 15], using the prior from Eq 7 can be found in Table 7. For the prepaid, we estimated the parameters using the methods online at http://www.prepaidestimation.org/. All estimations are similar and have overlapping confidence intervals. The prepaid estimation is however significantly faster.

Download:

Table 7. Population dynamics of the Chilo partellus [16, 15].

We show the estimates, the 95% confidence intervals and computation time of the prepaid and synthetic likelihood estimation techniques.

https://doi.org/10.1371/journal.pcbi.1007181.t007

Application 2: A stochastic model of community dynamics

A second model we will apply our prepaid modeling technique to, is a stochastic dispersal-limited trait-based model of community dynamics [17]. The data that will be modeled, are the abundances of species (hence a vector of frequencies, in which each component is a different species). Each species in the local environment is assumed to have a competitive value dependent on its trait u, given by the filtering function (17)

Here A is the maximal competitive advantage, h is the optimal trait value in the local environment and σ describes the width of the filtering function. At each time step, one individual from the local community dies. It is then replaced with a probability by a random descendant from the local pool. Here, J is the size of the local community and I is the fourth parameter to estimate, related to the amount of immigration from the regional pool into the local community. The probability that this descendant comes from a certain individual in the local community, is proportional to the competitiveness of this individual. With a probability of , the dead individual is replaced by an immigrant from the regional pool. The distribution of traits u of the individuals in the regional pool is assumed to be uniform over u. It is noteworthy that Jabot saw the necessity of reusing ABC simulations to reduce computation time in his recovery study [17].

The model was simulated using the C++ code from the Easy ABC package [22] where a regional pool of S = 1000 species was defined evenly spaced on the trait axis (i.e., the resolution) and J = 500 was the size of the local community.

ABC estimation.

We compare our prepaid method estimation with the Easy ABC package (ABC^Orig) [29, 22]. Because we work in a Bayesian framework, we first have to specify priors. As in Jabot et al. we use the following priors [22]: (18)

In this application, the parameter vector θ is defined as follows: θ = (log(I), log(A), h, log(σ)). To get the ABC algorithm to work, we compute four summary statistics: the richness of the community (number of living species), Shannon’s index which measures the entropy of the community, and the mean and the skewness of the trait distribution of the community.

The ABC algorithm we use applies a sequential parameter sampling scheme [30]. The sequence of tolerance bounds is given by ρ = {8, 5, 3, 1, 0.5, 0.2, 0.1} and the algorithm proceeds to the next tolerance after 200 simulations which lead to summary statistics within the bounds. The last 200 simulations within the bounds represent the posterior, and the estimate of the parameter is given by the posterior mean.