A few weeks ago, a friend asked me if I know the book "the cult of statistical significance". I did not. I replied the following, and today, he told me that he liked my summary of statistical inference:
Every year when I teach the confidence interval and t-test in the third and fourth weeks of the fall semester, I would have some new ideas about how to introduce these confusing concepts. This year, I said to the class that if I were to rewrite all statistics textbooks, I would not include hypothesis testing, because all the information included in a test is reflected in the confidence interval. The width of a confidence interval also tells us whether the data provided useful information. Should we use the confidence interval (and interpret it in the Bayesian way), we understand the problem of statistical inference better. This line of thinking is inspired by the "display()" function in R (Gelman's arm package), which summarizes a linear model coefficients only in terms of the estimated values and their standard error, no p-values are provided. Last year, I told my class to forget about the p-values and focus on the standard error, because we can quickly come up with an approximate 95% confidence interval of each coefficient (est +/- 2se). Many students told me that without p-values they think about the real meaning of the estimated
coefficients, and with p-values (using the standard R function summary()) they think about the significance.
Obviously, the development of hypothesis test is an important contribution to statistics. The way Fisher used hypothesis tests is more in line with a scientist evaluates his/her theory. A small p-value is evidence against the hypothesis, but not a support for any specific alternative. So, if we follow Fisher's approach, we start to think about new (specific) theory after seeing a small p-value. If we are interested in learning the speed of light by conducting a new experiment, we start from the existing estimate (mu0). Fisher would compare new data to the existing estimate and conclude whether mu0 might be an under- or over-estimate or about right. If the data show the speed is not the same as mu0, we may want to revise the estimate.
The data would say the likely true mean lies inside the confidence interval (when testing H0: mu=mu0, we would not reject the null if mu0 is inside the CI). The Neyman-Pearson paradigm of hypothesis test is aimed at decision making, not scientific inference. For them, statistical significance is of real significance, because it dictates the action we take. For example, Newman-Pearson approach would be very useful for a state agency when deciding whether to grant a permit for a wastewater treatment plant. A test H0: BOD<=30 versus BOD>30 (or the reverse) is all we need. We are not really interested in the true BOD concentration in the discharge. So, the misuse of significance is often a result of misunderstanding of a test. The way a typical statistics course treats this topic (mixing the two approaches into a unified test procedure) is a disservice to all. It simplifies a scientific problem into a yes/no dichotomy and prevents creative thinking.
Subscribe to:
Post Comments (Atom)
Log or not log
LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...
-
Statistics is More Than \( P \)-values and AIC Statistics is More Than \( P \)-values and AIC Introduction The cont...
-
The second edition of EESwithR is coming in fall 2016. I added one new chapter to the book and it is posted as an example chapter on githu...
-
even if you believe that she is not worthy of your talent. Recently, I served as the advisor of a graduate student at Duke University. It...
No comments:
Post a Comment