Song's Blog: October 2009

Wednesday, October 14, 2009

The cult of statistical significance

A few weeks ago, a friend asked me if I know the book "the cult of statistical significance". I did not. I replied the following, and today, he told me that he liked my summary of statistical inference:

Every year when I teach the confidence interval and t-test in the third and fourth weeks of the fall semester, I would have some new ideas about how to introduce these confusing concepts. This year, I said to the class that if I were to rewrite all statistics textbooks, I would not include hypothesis testing, because all the information included in a test is reflected in the confidence interval. The width of a confidence interval also tells us whether the data provided useful information. Should we use the confidence interval (and interpret it in the Bayesian way), we understand the problem of statistical inference better. This line of thinking is inspired by the "display()" function in R (Gelman's arm package), which summarizes a linear model coefficients only in terms of the estimated values and their standard error, no p-values are provided. Last year, I told my class to forget about the p-values and focus on the standard error, because we can quickly come up with an approximate 95% confidence interval of each coefficient (est +/- 2se). Many students told me that without p-values they think about the real meaning of the estimated
coefficients, and with p-values (using the standard R function summary()) they think about the significance.

Obviously, the development of hypothesis test is an important contribution to statistics. The way Fisher used hypothesis tests is more in line with a scientist evaluates his/her theory. A small p-value is evidence against the hypothesis, but not a support for any specific alternative. So, if we follow Fisher's approach, we start to think about new (specific) theory after seeing a small p-value. If we are interested in learning the speed of light by conducting a new experiment, we start from the existing estimate (mu0). Fisher would compare new data to the existing estimate and conclude whether mu0 might be an under- or over-estimate or about right. If the data show the speed is not the same as mu0, we may want to revise the estimate.
The data would say the likely true mean lies inside the confidence interval (when testing H0: mu=mu0, we would not reject the null if mu0 is inside the CI). The Neyman-Pearson paradigm of hypothesis test is aimed at decision making, not scientific inference. For them, statistical significance is of real significance, because it dictates the action we take. For example, Newman-Pearson approach would be very useful for a state agency when deciding whether to grant a permit for a wastewater treatment plant. A test H0: BOD<=30 versus BOD>30 (or the reverse) is all we need. We are not really interested in the true BOD concentration in the discharge. So, the misuse of significance is often a result of misunderstanding of a test. The way a typical statistics course treats this topic (mixing the two approaches into a unified test procedure) is a disservice to all. It simplifies a scientific problem into a yes/no dichotomy and prevents creative thinking.

Tuesday, October 13, 2009

The logical incoherence of modern statistical practice

The great Stuart Hurlbert came to Duke to give a talk on the logical incoherence of modern statistical practice. My immediate thought after the talk was that the title of his talk was all wrong. It should be: the logical incoherence of ecological applications of statistics. The most obvious "incoherence" in modern statistics is the violations of the likelihood principle of many concepts and practices that were not mentioned at all. In fact, he could have ended the talk in five minutes by citing the first paragraph of Berger and Wolpert (1988):

Among all prescriptions for statistical behavior, the Likelihood Principle (LP) stands out as the simplest and yet most far reaching. It essentially states that all evidence, which is obtained from an experiment, about an unknown quantity [;\theta;], is contained in the likelihood function of [;\theta;] for the given data. The implications of this are profound, since most non-Bayesian approaches to statistics and indeed most standard statistical measures of evidence (such as coverage probability, error probabilities, significance level, frequentist risk, etc.) are then contraindicated.

Here is a run-down of the "sins" of statistics discussed by Hurlbert:

1. a fixed type I error probability (alpha) --
2. the use of term "significant"
3. the concept of type II error (here he thinks that we should never accept the null hypothesis)
4. one-tailed hypothesis tests
5. multiple comparisons, and
6. repeated measures ANOVA.

Song's Blog

Wednesday, October 14, 2009

The cult of statistical significance

Tuesday, October 13, 2009

The logical incoherence of modern statistical practice

Log or not log

Search This Blog