Thursday, June 5, 2008

Why teaching statistics is difficult

Many students find statistics difficult. They often complain that my class has no structure and they don't know how to apply the materials learned in class to their homework. I had the same experience. The only class I got a C in college was probability and statistics. I did not like that course because I could not find a system like mathematics. We were taught a collection of techniques and homework assignments are mostly irrelevant to any real application. In graduate school, I liked the introductory Bayesian inference very much because I could do the mathematics. All my energy was in the derivation of various posterior distributions. It gave me a sense of accomplishment. But after the introductory Bayesian course, I started to wonder why we have to do all the math just to find the posterior distribution that is not entirely different from the results using simple models in classical statistics. When I started teaching statistics to graduate students in environmental and ecological sciences, I wanted to teach in a different way. I wanted to show that statistics is different from mathematics. Thinking in statistics is different from thinking in mathematics. Mathematics is inductive reasoning, and statistics is deductive reasoning. Inductive reasoning starts from a set of premises and uses a set of rules or logic to move from point A to point B. Deductive reasoning is the opposite. We observe data and try to figure out what was the process that generated the data. Induction is "easy", deduction is difficult. We all know what will be the consequence if we leave an ice cube on a table at room temperature. But if we see a puddle of water on the same table, tracing back to the source of the water is difficult. If we did not see how it came to the desk, we can never know for sure. The situation applies to science too. In science, we observe data and try to understand the cause behind the data. We can propose different hypothesis, but no matter how simple the problem is, we can never be sure that the theory is correct. This problem of induction, first introduced in 1777 by David Hume, has yet to find a solution. Statistics is the tool for inductive reasoning. We observe data and try to estimate the parameter or model. When we observed a sample and calculate the sample mean \bar{x}, we don't claim that we know the population mean. We don't even know whether the sample mean is close to the true population mean. To give quantify the uncertainty, we calculate the confidence interval. But the confidence interval concept is a mathematical concept based on long-run frequencies. Its interpretation is counterintuitive. As a result, statistical reasoning is difficult. There is no rule to follow that will ensure a correct answer. But introducing this line of thinking in class is foolhardy, because we are so used to deductive reasoning. Most of our training in science is in the analytical skills necessary for deduction.

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...