Song's Blog: September 2010

In my 2003 Ecological Modelling paper on ecological threshold, I used the bootstrapping method for estimating the standard error of a change point variable. The change point was estimated as the binary break point that resulted in the greatest reduction in deviance. Ian McKeague of FSU (now at Columbia) told me about Buhlmann and Yu (2002), which suggested that bootstrapping is inappropriate for a change point problem. I was convinced that the bootstrap confidence interval is wrong, but was never able to explain the reasons convincingly until recently.

A change point problem limits the potential break points to be the gaps (intervals between distinct values) in the given data set. These potential break points do not change from bootstrap sample to bootstrap sample, only that some gaps may become wider. Should the population were sampled, the potential number of break points is infinite. As a result, bootstrap estimated break point is of an artificially smaller sampling variance than the variance of the true sampling distribution of the break point. This reduced sampling variation is likely the cause of a much smaller standard error or narrower confidence interval when estimated using bootstrapping. In Banerjeer and McKeague (2007), a simulation is performed to show that a 95% confidence interval based on bootstrapping covers the underlying true change point far less than 95% of the time.

I have been reading causal inference using observational data for almost three years. Three years ago, I read a paper on the effectiveness of agricultural best management practices on reducing nutrient loading from fields. The data were observational, and the estimated effects are affected by confounding factors. Using that data set, I learned how to use multilevel model and made a few homework assignments for my class. The question in that data set is clear and the approach of using the propensity score match is obvious. I did a comparison of using the propensity score matching and multilevel modeling for estimating the effect of conservation practices on nutrient loading from agriculture fields. The results were not surprising -- on average, any conservation practice would lead to a reduction of nitrogen/phosphorus loading by 1/3-2/3. I have since read Rosenbuam and Rubin (1983) and other classical literatures. To summarize the main idea, the problem of causal inference of a treatment effect is always the problem of counter factual. That is, we can never apply a treatment and control to the same subject to assess the real treatment effect on a single subject. When treatment and control were applied to two different subjects, the result can be due to confounding factors. Statistical solution to this problem is Fisher's randomization. Subjects are randomly applied control and treatment, effectively forming two groups that are otherwise similar and one applied treatment and the other control. The difference in response can be confidently attributed to as the treatment effect. When using observational data, treatment and control are not applied randomly. Consequently, no causal inference should be made without careful adjustment. The propensity score matching approach is one of many methods used for causal inference based on observational data. The basic idea is that data can be subset such that the treatment and control groups are similar in terms of confounding factors. The similarity is achieved though the propensity score, or the likelihood a subject receiving treatment given its confounding factor levels. Operationally, a propensity score is estimated as the probability of receiving treatment through a logistic regression using all available covariates. Each observation in the treatment (or control) group is matched with an observation in the other group with a similar propensity score. Not all observations will be included in the final data set. After matching, we essentially have a treatment group and a control group that are similar with respect to confounding factors. The difference in response is likely due to the treatment.

As central idea of using the propensity score is matching and subset to balance out the effects of confounding factor. In environmental studies, the treatment is often a continuous variable. For example, the dose-response curve, the effect of nutrient on lake eutrophication. I read Imai and van Dyk (2004) and Hirano and Imbens (2004) recently and learned the generalized propensity score approach for continuous and other forms of treatments. The generalized propensity score is actually a natural extension of the binary treatment one. The propensity score of a binary treatment variable is the probability of an observation receiving treatment, or the probability distribution function of the binomial distribution of treatment. The generalized propensity score is the probability density of treatment conditional on confounding factors. One can estimate this probability distribution by fitting a regression model if the treatment over confounding factors. The resulting regression model forms a conditional normal distribution of the treatment variable, from which the propensity scores can be calculated (as the density of the model residuals (~N(0, sigma)). Imai and van Dyk (2004) suggested that the resulting propensity scores be used to divide the observations into J groups of similar propensity scores, and within each group a regression model is fit for the treatment effect. The average causal effect is a weighted mean of the J effects. Hirano and Imbens (2004) suggested a two step process: (1) the response is regressed to the polynomial of treatment and the propensity scores, and (2) the causal effect is estimated as the expected response of a given treatment value over all observations.

In any case, the propensity score approach still aims at achieving a balance of the confounding factors. When observations are divided into groups based on the estimated propensity scores (Imai and van Dyk, 2004), distributions of confounding factors (coveriates used for estimating propensity scores) are more less balanced.

A very odd application of the generalized propensity score is published recently in Ecological Applications (Yuan 2010). The paper misinterpreted the definition of the propensity score. Instead of the probability density of an observation, Yuan (2010) used the predicted mean treatment value as the propensity score. The resulting subgrouping further segregated the distributions of confounding factors, leading to a potentially more biased estimate of the causal effect.

Song's Blog

Monday, September 20, 2010

Bootstrapping for threshold standard error

Causal inference based on observational data

Log or not log

Search This Blog