A while
ago, we wrote a paper on correcting re-transformation bias in regression
analysis using Bayesian regression [Stow et al., 2006], where we used a Bayesian regression model
implemented in WinBUGS and obtained the predictive posterior distribution of
the response at given predictor values. The re-transformation bias was avoided
by performing the re-transformation on MCMC samples from the posterior
predictive distribution before calculating the predictive means in the original
scale. This approach obviously works. But the added burden of using WinBUGS is
a real obstacle for many. In Gelman and Hill [2007], the same Bayesian posterior sampling was
done by using the sampling distributions of regression model parameters. For a
simple linear regression problem of y = β0 + β1x + ε,
where ε ~ N(0,σ2), the classical statistics
provides sampling distributions of σ2 and regression coefficients β0 and β1. The sampling distribution of σ2 is a scaled inverse chi square
distribution:
where
n is the sample size, p is
the number of coefficients, andis the estimated residual
standard deviation. Conditional on σ2, the joint distribution of (β 0,β1) is a multivariate normal:
where
V β = (XT X)-1 is the unscaled covariance
matrix and (0, 1) are the estimated model
coefficients. These results can be used to derive a Monte Carlo simulation
algorithm for generating samples of model coefficients form their joint
posterior distribution.
- Extract necessary information from the fitted linear model object lm.obj:
- Draw n.sim posterior samples:
- Put the above two components into a function (e.g., lm.sims(lm.obj, n.sim) we can draw random samples from the joint posterior distribution of model coefficients, which can be used for drawing posterior predictive samples:
coef.post <- lm.sims(lm.obj=my.lm, n.sim=1000)
y.tilde <- rnorm(n.sim, coef.post[,1]+coef.post[,2]*x.tilde,
where x.tilde is the predictor value at which the predictive distribution y.tilde is evaluated.
In
other words, we simplified the Bayesian operation into two lines of R code in
step 3. In Section 9.2.2.1 of Qian [2010], I compared the
simple Monte Carlo simulation approach to the Bayesian regression (using MCMC).
Drawing random samples of σ in step 2 is definitely a Bayesian step, as the classical sampling distribution is defined for . In classical statistics, is a random variable, and in Bayesian statistics, it is fixed. In any case, summarizes information about σ in the data. The
connection between a Bayesian approach and the corresponding classical approach
is the shared likelihood function. If we use “non-informative” flat priors, the
Bayesian posterior distribution is proportional to the likelihood function. Consequently, if we can find quick solutions from the classical repertoire of models, we should use them. The
line that separates a Bayesian from a classical approach is the definition of
probability. The interpretation of probability can be subjective (e.g., the
chance of a candidate wins the election) or objective (e.g., the chance of a
head in a coin toss). In most scientific endeavors, we are more likely to use
the former interpretation (e.g., the likelihood that a model is true).
Drawing random samples of
Whether
we call ourselves Bayesian or not should not be based on whether we use MCMC
(WinBUGS, OpenBUGS, JAGS, Stan), rather based on how we interpret the results.
For most standard statistical modeling problems, we should use standard
classical model results. MCMC gives us a tool for complicated computations. The
use of MCMC is not equivalent to being Bayesian.
References
A. Gelman and
J. Hill. Data Analysis
Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York, 2007.
S.S. Qian. Environmental and Ecological Statistics with R. Chapman and Hall/CRC
Press, 2010.
C.A. Stow, K.H. Reckhow, and
S.S. Qian. A Bayesian approach to retransformation bias in transformed
regression. Ecology, 87 (6): 1472–1477, 2006.