To be Bayesian or to Bootstrap: what is the risk?
(Malcolm Haddon is to be found at the Marine Research Laboratories, Tasmanian Aquaculture and Fisheries Institute, University of Tasmania, Nubeena Crescent, Taroona, TAS 7053. His email address is Malcolm.Haddon@utas.edu.au)
A discussion is presented on the advantages and disadvantages of both Bayesian methods and Bootstrapping methods of stock assessment and Harvest Strategy Evaluation. Neither is the perfect solution to the problems encountered in stock assessment. From the literature one could be forgiven for believing that a stock assessment not conducted using Bayesian methods is somehow inferior. This is not the case and the particular circumstances of each fishery should be considered when deciding what methods should be used. It could also be argued that both methods should be applied to determine whether they produce contrasting results, which hopefully would be enlightening about the system under study.
When investigating the most likely outcomes of different management or harvest regimes one attempts to answer questions about risk. Typically, one may ask: 'If we manage a fishery in a given way (e.g. a particular Total Allowable Catch (TAC), or number of days fishing) how likely or probable is it that a particular performance measure will be achieved (e.g. stock biomass in five years' time will be greater than some reference biomass)?' One can be more specific and ask: 'What management strategy will lead to there being an X % chance that stock size will have increased relative to some reference biomass over a Y year period (where X and Y are your favourite numbers for this sort of question)?'. The details of the management strategy and performance measure will differ depending on the fishery, but the essence of the problem is that the answer is not deterministic. The questions are about the risk of following a defined management strategy, with the risk being defined in terms of the performance measure used. To answer the questions we conduct risk assessments.
The analyses cannot be deterministic because we can only obtain uncertain information about the system being assessed. Not only will there be variation in the data but the parameters of the model used to describe the system are probably not constants as the models tend to assume. The first source of uncertainty would be termed 'observation error' while the second source is generally termed 'process error' . As we cannot be certain that the specification of the model used to describe the system will be sufficient to capture the important dynamics of the fishery there is also uncertainty over the structure of the model used in the assessment. Many assessment models only deal with the observation error and effectively ignore both the process error and model uncertainty. Even so, the practice of using multiple models of varying complexity when assessing fish stocks is becoming more common.
Through the 1990s, there developed a debate on how best to approach the problem of accounting for uncertainty in stock assessments for fishery resources and how that uncertainty could be included in risk assessments. The problem definitions and even the meanings of various terms used were refined and consistency achieved in the literature. Important contributions to this debate across a range of ideas, were produced by investigators such as Francis (1992), Punt and Hilborn (1997), and Chen and Fournier (1999). A major part of the debate was over the use of Frequentist methods versus Bayesian methods. The intensity and fervour of the arguments was remarkable but the odd thing is that of the fisheries modelling literature it appears that the Bayesian camp won the debate. Nowadays, in fisheries where sufficient information exists to seriously consider management strategy evaluation, resource managers tend to equate stock assessment with associated harvest strategy evaluation, by using Bayesian methods. Their action is, however, a mistake: it is not a necessity that Bayesian methods be used to characterise the uncertainty in a stock assessment and then conduct a risk assessment. There are advantages and disadvantages to both the analytical methods available for fisheries stock assessment; that is, Bayesian and Frequentist methods. In this document we will restrict our discussion of Frequentist technique to a discussion of bootstrapping methods.
Bayesian and bootstrapping methods deal with available data in different ways when attempts are made to characterise the uncertainty in an analysis. Superficially, both methods are similar because they both develop arrays of different parameter values and both can be used to characterise the uncertainty inherent in any model fitting exercise (stock assessment), generating the parameter estimates and model outcomes needed in an assessment. In addition, both can be used as the basis for developing the projections that are fundamental to harvest strategy evaluation. However, the parameter estimates and model outcomes are produced in very different ways and these differences are behind the controversy between the two methods.
Bootstrapping is a method that simulates multiple data sets equivalent to the original (Fig. 1). The assessment model – if fitted anew to each of these bootstrap samples – produces a new optimum model, complete with new model parameter estimates and model outcomes. By collating a multitude of these separate parameter estimates, percentile confidence intervals can be produced. The bootstrap samples are generated by combining the expected values for the variable being modelled, taken from the optimally fitting model, with bootstrap samples of the residuals between the observed data and that predicted by the optimum model (Efron and Tibshirani, 1993; Haddon, 2001). A bootstrap sample is simply a random selection of the same number of observations from a set of observations with the selection being made by replacement (hence some observations can occur more than once and others can be entirely omitted). The emphasis is on the data rather than the parameter estimation.

Figure 1. The raw catch rates of the Cape hake fishery (Merluccius capensis and M.paradoxus) off northern Namibia (listed in Polacheck et al., 1993, table 1) are shown by the thick line. Six bootstrap samples are illustrated by the fine lines; these are produced by combining the expected catch rate from the optimally fitting model with bootstrap samples of the residuals between the observed data and that predicted by the optimum model.
One advantage of the bootstrapping approach that follows from this emphasis on the data is that if the parameter estimation is in any way biased then bootstrapping can provide an estimate of that bias (Efron and Tibshirani, 1993). This does not arise with the Bayesian approach.
The Bayesian approach determines the relative quality of fit produced by different combinations of parameter estimates (along with the prior probabilities of each parameter set) (Fig. 2). This relative fit is described by the posterior probability distribution for the parameter set. Because it does not involve fitting the model each iteration, the Bayesian approach sounds like it would be a good deal more rapid than the bootstrap process. However, the processes used to select parameter combinations that are likely to be of acceptable fit (Sampling Importance Resampling – SIR; Monte Carlo Markov Chain – MCMC) also take a very long time. The algorithms involved can be relatively simple (Gelman et al., 1995; McAllister and Ianelli, 1997) but each is attempting to define the extent and shape of the posterior distribution of the model parameters.

Figure 2. Four independent MCMCs approaching the posterior distribution of two parameters from a simple stock assessment model (cf. Fig 11.2, Gelman et al., 1995). All of the starting points are out on the periphery but converge to an area that, if the MCMC procedure had been prolonged, would define the posterior distribution of both model parameters.
To obtain an adequate description of a complex posterior for many parameters, and determining the model outcomes from those estimates, can take millions of iterations of parameter selection. In this way the emphasis in Bayesian analysis is on the parameter values.
Both bootstrapping and Bayesian posteriors can be used to determine the precision with which parameter estimates (and model outcomes) are obtained. Projections used in risk assessments can use the parameter sets applied to define the Bayesian posteriors, or from bootstrap analyses, as a starting.
Both analyses are obviously highly dependent upon the original data used. Bootstrapping attempts to characterise uncertainty by a consideration of the relationship between the data and the optimal model fit; that is, it uses the residuals to generate alternative data sets that might have been. Bayesian analyses characterise uncertainty by investigating how the quality of fit to the sample data alters as the parameter set selected is altered. Bootstrapping approaches the problem by modifying the data and determining the implications, while Bayesian analysis modifies the parameter sets and determines the implications.
A potential problem with the Bayesian approach is illustrated by Press et al. (1989: 549) who were attempting to capture the notion of likelihood: 'It is not meaningful to ask the question, ‘What is the probability that a particular set of fitted parameters a1 … aM is correct?’ The reason is that there is no statistical universe of models from which the parameters are drawn. There is just one model, the correct one, and a statistical universe of data sets that are drawn from it!'
Press et al. (1989) are saying that the data set actually obtained is only a sample from the world and that, under different circumstances, a slightly different set of data could have been possible. However, there is only one optimal model underlying the processes being modelled, from which the data was gathered. The observed data vary about the expected values in a manner that relates to whatever statistical distribution is used in the model (e.g. normal, log-normal). The parameter values do not vary in any known way or in relation to any known statistical distribution; they are not random variables they are simply unknown variables.
These ideas capture some of the flavour of the philosophical disagreement between Frequentists and Bayesians. As stated by Dennis (1996: 1098), 'Bayesians, you see, are not allowed to look at their residuals. It violates the likelihood principle to judge an outcome by how extreme it is under a model. To a Bayesian, there are no bad models, just bad beliefs.' However, in practice, which method works best?
One source of argument about the use of Bayesian methods is the dependence of Bayesian analyses on the prior distributions that are attributed to each parameter being considered. There are a number of problems with priors and their generation. At their most extreme, priors can be generated that include the opinions of informed individuals (expert opinion). Such informative priors can influence the outcomes of analyses.
When discussing the justification of the origin of priors, Punt and Hilborn (1997: 43) stated: 'We therefore strongly recommend that whenever a Bayesian assessment is conducted, considerable care should be taken to document fully the basis for the various prior distributions. … Care should be taken when selecting the functional form for a prior because poor choices can lead to incorrect inferences. We have also noticed a tendency to underestimate uncertainty, and hence to specify unrealistically informative priors – this tendency should be explicitly acknowledged and avoided.'
The use of informative priors has been so controversial that Walters and Ludwig (1994) recommended that non-informative priors be the used as a default in Bayesian stock assessments. Unfortunately however, there is a problem with trying to generate non-informative priors (Box and Tiao, 1973). The problem with generating non-informative priors is that they are sensitive to the particular measurement system used (Punt and Hilborn, 1997). That is, a prior that is uniform on a linear scale will not appear linear on a log scale (Fig. 3).

Figure 3. The same data plotted on a natural logarithmic scale (upper panel) and a linear scale (lower panel). The uniform distribution in the bottom panel appears distorted when viewed in logarithmic space. Note the effect on the vertical scales of the two graphs (after Fig 3.28 in Haddon, 2001).
One strangely useful aspect of using uninformative priors is that when the available data are uninformative with respect to a particular parameter the posterior will be equivalent to the imposed prior.
The two approaches handle unknown and poorly estimated parameters differently. A classical problem in fishery modelling is how to handle natural mortality. It is often assumed constant across ages and through time but attributing an exact value to the natural mortality rate is equivalent to claiming one knows this value without uncertainty (obviously a poor assumption). How Bayesian analyses handle such situations is often argued to be one of its greatest strengths. A Bayesian analysis would allocate a plausible prior distribution to such a parameter and the implications of this would be integrated over the different values during the generation of the posterior probability distribution of the remaining parameters. Thus, the uncertainty relating to such awkward parameters is dealt with in an elegant and simple manner. In effect, this is equivalent to including an element of process error into the analysis. However, this strategy can give the impression of a greater understanding of a situation than really exists. Integrating over such parameters is a fine idea except for the problem of selecting a suitable prior. Including process error into an analysis would be extremely valuable except, once again, it implies more knowledge than we have to hand. While reasonable arguments can be made for generating the required plausible priors it would still involve including informative priors into what is invariably a sensitive area of any assessment model. If there is no information about how, for example, natural mortality varies (the absence of such information is standard) then including such priors can give overly confident conclusions.
On the other hand, using bootstrapping there is no general method available for dealing either with process error and observation error or with difficult parameters such as natural mortality. The use of the Kalman filter to work with both process and observation errors is not yet a general solution to all assessment problems (Sullivan, 1992). The only remaining option is to conduct classic sensitivity tests setting the parameters of concern (e.g. M) to a set of different values and determining the effect. The only advantage this has is that it explicitly identifies the uncertain parameters and permits any trend in the impact of changing their values to be determined.
In this brief essay it should appear obvious that my preference lies with non-Bayesian approaches. However, both Bayesian posteriors and bootstrapping are, in the end, just tools that can be used in the formal assessment of natural resources. Whether one wishes to ignore the philosophical issues and become known as a pragmatic Bayesian is not really a problem because occasions arise when the use of one approach would be preferred to the other.
As with many of these controversies it is possible that the solution lies somewhere in the middle of the alternatives. It is good practice to apply multiple models to the same situation. Speaking about surplus-production modelling versus age-structured modelling, Hilborn and Walters (1992: 329) stated: 'It is better to think of the two methods as simply different; if biomass dynamic methods provide a different answer than age-structured methods, then the scientist should try to understand why they are different and analyse the management implications of the different predictions, rather than concentrating on deciding which method is correct.'
I suggest that a suitably altered but identical statement could be made about using Bayesian and alternative methods as very often the results are surprisingly similar. However, different results should provide some insight into what is driving the differences. It seems quite possible that uncertainty is being underestimated by both methods in many assessments and that uncertainty may be more likely to come to light if both methods are applied. Of course, each of these methods can take a good deal of time so suggesting this option may not be as sensible as it may appear at first sight. Nevertheless, rather than risk throwing out the potential benefits with the bootstrapping bathwater I recommend that both methods be considered when conducting analyses, especially if the levels of uncertainty appear to be high.
This paper was written after the loss of the original conference paper and so may not be an accurate reflection of the contents of the talk given in 2001; a fact that explains its almost note-like brevity. I thank Dr Norm Hall for reminding me of a number of the things I spoke about at the workshop. In addition, I thank Dr. Stephen Newman for his remarkably gentle reminders about deadlines.
Box , G.E.P. and Tiao, G.C. 1973. Bayesian inference in statistical analysis. Addison-Wesley, Reading, Massachusetts.
Chen, Y. and Fournier, D. 1999. Impacts of atypical data on Bayesian inference and robust Bayesian approaches in fisheries. Canadian Journal of Fisheries and Aquatic Sciences 56: 1525-1533.
Dennis, B. 1996. Should ecologists become Bayesians? Ecological Applications 6: 1095-1103.
Efron, B. and Tibshirani, R.J. 1993. An introduction to the Bootstrap. Chapman and Hall, London.
Francis, R.I.C.C. 1992. Use of risk analysis to assess fishery management strategies: a case study using orange roughy (Hoplostethus atlanticus) on the Chatham Rise, New Zealand. Canadian Journal of Fisheries and Aquatic Sciences 49: 922-930.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. 1995. Bayesian data analysis. Chapman and Hall, London.
Haddon, M. 2001. Modelling and quantitative methods in fisheries. Chapman and Hall/CRC, Boca Raton.
Hilborn, R. and Walters, C.J. 1992. Quantitative fisheries stock assessment: choice, dynamics, and uncertainty. Chapman and Hall, London.
McAllister, M.K. and Ianelli, J.N. 1997. Bayesian stock assessment using catch-age data and the sampling – importance resampling algorithm. Canadian Journal of Fisheries and Aquatic Sciences 51: 2673-2687
Polacheck, T., Hilborn, R. and Punt, A.E. 1993. Fitting surplus production models: comparing methods and measuring uncertainty. Canadian Journal of Fisheries and Aquatic Sciences 50: 2597-2607.
Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. 1989. Numerical recipes in Pascal: the art of scientific computing. Cambridge University Press, London.
Punt, A.E. and Hilborn, R. 1997. Fisheries stock assessment and decision analysis: the Bayesian approach. Reviews in Fish Biology and Fisheries 7: 35-63.
Sullivan, P.J. 1992. A Kalman filter approach to catch-at-length analysis. Biometrics 48: 237-257.
Walter, C.J. and Ludwig, D. 1994. Calculation of Bayes posterior probability distributions for key population parameters. Canadian Journal of Fisheries and Aquatic Sciences 51: 713-722.