*Ecology*, the short comment by Burnham and Anderson stood out as the most ridiculous one. Not only their use of 20th- versus 21st-century statistics is absurd (what about Bayesian, 17th-century statistics?), their use of the falsification principle to claim that hypothesis testing is bogus seems odd. They claimed that because null hypothesis testing cannot test or falsify the alternative hypothesis, hypothesis testing "seems almost a scandal." In August 2016, I read this post by Mayo, which addressed the scandal part of Burnham and Anderson. She said: "I am (almost) scandalized by this easily falsifiable allegation!"

# Song's Blog

Thoughts on environmental and ecological applications of statistics

## Friday, February 10, 2017

## Tuesday, June 28, 2016

### Environmental and Ecological Statistics with R (2nd edition)

The second edition of EESwithR is coming in fall 2016. I added one new chapter to the book and it is posted as an example chapter on github, along with R code and data sets. The other main change is the replacement of the term statistical significant with something like "statistically different from 0." The second edition also includes a large number of exercises, many of them have been used as homework assignments in my classes over the last ten years. I am working on a solution pamphlet, as well as additional problems. One unique feature of these exercises is the lack of a unique solution to almost each of these questions. There are always multiple interpretations of a problem. When grading homework, I look for student's thought process. I welcome suggestions and recommendations on additional exercise problems.

## Monday, June 13, 2016

### Hypothesis testing and the Raven Paradox

I was going over my old reading logs the other day and the saw my notes on the Raven paradox (a.k.a. Hempel's paradox). The statement that "all ravens are black" is apparently straightforward and entirely true. The logical contrapositive "everything that is not black is not a raven" is also obviously true and uncontroversial. In mathematics, proof by contrapositive is a legitimate inference method. That is, you can show "If not B then not A" to support "If A then B." The raven paradox is apparently paradoxical because it suggests that observing a white shoe (I. J. Good) is evidence supporting the claim that all ravens are black. I.J. Good proposed a Bayesian explanation (or solution) of the paradox. The weight of evidence provided by seeing a white shoe (or any none black object that is not a raven) is positive, but small if the number of raven is small compared to all non-black objects. But how is the paradox relevant to statistical hypothesis testing?

Statistical Hypothesis Inference and Testing is relevant to discussing the Raven paradox because we show support to our theory (the alternative hypothesis) by showing that a non-white object is not a raven (the null hypothesis). If we are interested in showing that a treatment has an effect, we start by setting the null hypothesis as the treatment of no effect. Using statistics, we show that data do not support the null hypothesis; hence the logic of contrapositive leads to the conclusion that the treatment is effective. I have no problem with this thought process, as long as we are only interested in a yes/no answer about the effectiveness of the treatment. How effective is of no interest. But if we are interested in quantifying the treatment effect, hypothesis testing is almost always not appropriate. When we are interested in quantifying the effect, we are interested in a specific alternative. For example, when discussing the effectiveness of agricultural conservation practices on reducing nutrient loss, we want to know the magnitude of the effect, not whether or not the effect exists. Showing that the effect is not zero gives some support to the claim that the effect is X, but not much. This is why we often advise our students that statistical significance is not always practically useful, especially when the null hypothesis itself is irrelevant to the hypothesis of interest (the alternative hypothesis).

A "threshold" model known as TITAN is a perfect example of the Raven paradox.

The basic building block of TITAN is a series of permutation tests. Although TITAN's authors never clearly stated the null and alternative hypothesis, it is not difficult to derive these hypotheses using the basic characteristics of a permutation test. The hypothesis of interest (the alternative) is that changes of a taxon's abundance along an environmental gradient can be approximated by a threshold model (specifically, a step function model). The null hypothesis is that the taxon's abundance is a constant along the same gradient. We can rephrase the alternative hypothesis as: the pattern of change of a taxon's abundance is a threshold model. The null hypothesis is that the pattern of change is flat. When we reject the null, we say that the pattern of change is not flat. The rejection can be seen as evidence supporting the alternative, but the weight of evidence is small if the number of non-flat and non-threshold patterns of change is large.

## Thursday, February 4, 2016

### The Everglades wetland's phosphorus retention capacity

In 1997, I and Curt Richardson published a paper on using a piecewise linear regression model for estimating the phosphorus retention capacity in the Everglades. At the time, fitting a piecewise linear model is not a simple task. As I was up to date on Bayesian computation, I used the Gibbs sampler. It was an interesting exercise to derive the full set of conditional probability distribution function. The process is tedious but not hard. When applied to the Everglades data, we concluded that the Everglades' phosphorus retention capacity is about 1 gram of phosphorus per year per square meter (the median is 1.15), with a 90% credible interval of (0.61, 1.47) (Table 2 in Qian and Richardson, 1997). The posterior distribution of the retention capacity is skewed to the left. In subsequent papers, Curt Richardson name the result as "the 1 gram rule". The South Florida Water Management District (SFWMD) never believed our work and often claimed that the retention rate would be much higher.

Since then, SFWMD has constructed several Stormwater Treatment Areas (STAs) -- wetlands for removing phosphorus and has been monitoring the performances. The latest results (Chen, et al, 2015) showed that the retention capacity of these STAs is 1.1 +/- 0.5 grams per square meter per year.

I was satisfied that finally SFWMD agreed with my finding, even if the agreement took them nearly 20 years (and hundreds of millions of dollars).

Chen, H., Ivanoff, D., and Pietro, K. (2015) Long-term phosphorus removal in the Everglades stormwater treatment areas of South Florida in the United States.

Qian, S.S. and C.J. Richardson (1997) Estimating the long-term phosphorus accretion rate in the Everglades: a Bayesian approach with risk assessment.

Since then, SFWMD has constructed several Stormwater Treatment Areas (STAs) -- wetlands for removing phosphorus and has been monitoring the performances. The latest results (Chen, et al, 2015) showed that the retention capacity of these STAs is 1.1 +/- 0.5 grams per square meter per year.

I was satisfied that finally SFWMD agreed with my finding, even if the agreement took them nearly 20 years (and hundreds of millions of dollars).

Chen, H., Ivanoff, D., and Pietro, K. (2015) Long-term phosphorus removal in the Everglades stormwater treatment areas of South Florida in the United States.

*Ecological Engineering*, 29:158-168.Qian, S.S. and C.J. Richardson (1997) Estimating the long-term phosphorus accretion rate in the Everglades: a Bayesian approach with risk assessment.

*Water Resources Research*, 33(7): 1681-1688.## Friday, December 18, 2015

### Explaining Science to the Media

The memory of the 2014 water crisis is still fresh in the minds of many people in the Toledo, OH area. Anything related to harmful algal bloom in Lake Erie will make to the news one way or the other. Yesterday, I was asked to explain the research published in this paper to two local TV channels. In my mind, the goal of the work was to use better statistical method to reduce measurement uncertainty. To introduce the work without touching the term Bayesian statistics, I talked about how decision under uncertainty often result in the lack of confidence in the final choice. The lack of confidence is often the reason for less effective communication between the decision maker and the public. In this case, uncertainty made explaining the "Do Not Drink" order very difficult. The difficulty, in turn, led to the lack of communication between the city and the public, resulting in public anxiety and second guessing about the order later. When implemented, our method can result in a more confident decision, thereby, help the city to better communicate with the public. The final cuts from both TV stations present the problem with a single question, did the City of Toledo make the right call in issuing the "Do Not Drink" order? They effectively conveyed my message without mentioning words like "risk communication" and the reasoning behind my answer to the question. Our training in scientific writing does not help us in explaining science to the public. I have a lot to learn from reporters. By the way, the news cast seems to show that I know how to operate the ELISA test. This was, in fact, the first time I touched an ELISA kit.

## Tuesday, November 24, 2015

### Uncertainty in measured microcystin concentrations using ELISA

We published a paper on the uncertainty in the measured microcystin concentration using the commonly used method known as ELISA. Microcystin is a group of toxin associated with blooms of cyanobacteria. One high concentration value detected in a drinking water sample in Toledo in 2014 resulted in a "do not drink" advisor that affected about half a million people in the Toledo area. In the paper we discussed the high level of uncertainty associated with the estimated concentrations and provided a Bayesian Hierarchical Modeling (BHM) approach for reducing the uncertainty. I have uploaded data used in that paper to GitHub.

## Thursday, October 29, 2015

### Results or Methods

In a recent paper, I and my co-author discussed the use of statistical causal analysis (propensity score matching) for analyzing observational data. The example is to estimate the effect of water and soil conservation practices on controlling nutrient loss from farm fields. The data were observational, a collection of measured P and N loss from various field level studies. My interest on the subject started in 2009 after I completed this book, thinking about whether the topic of causal analysis should be included in the future. When a colleague shared a dataset collected by USDA, I decided to study causal analysis and use it in my class.
The effect of conservation practices on nutrient loss has been a topic of agricultural studies for a long time. But the method used in various studies is inevitably modeling. But these models were never properly calibrated and the basic input to these model is the nutrient yield. In many cases, researchers simply assume a fixed rate of reduction in nutrient yield and plug the rate into the model to calculate the total reduction. I have not seen a study properly document the effect of conservation practices using a randomized experiment. After discussing with colleagues, I realized that a randomized experiment is practically impossible. However, the decision of implementing conservation practices is often based on whether a field is prone to soil and water loss. As a result, if we combine data from many studies and compare the nutrient loss from fields with and without conservation practices, we often find that fields with conservation practices have larger nutrient losses. But we are often comparing fields with row crops (with conservation practices) to pastures (without conservation practices). In one published paper, the authors were puzzled by the result of such a comparison.
I worked on a dataset consisting of measurements from about 160 papers with field scale measurements on nutrient loss, fertilizer application rate and methods, and other routinely measured variables (crops, best management practices, conservation practices). Using the dataset as an example, I taught statistical causal analysis in 2011, 2013, and 2014 in my graduate level statistics classes. I wrote the paper based on how students responded to the materials. I found that the concept of confounding factor is often new to students, which is not surprising as most students don't have a good conceptual understanding of statistics. I included a long background subsection in the Introduction.
From a statistical perspective, the methods section on the propensity score matching method can be described in general terms. The results, in my mind, include not only the estimated effects, but also additional factors to be controlled. In other words, the derived statistical model and its interpretation should be part of the results section. One reviewer and the associate editor insisted that we move the description of the model to methods section. They recommended rejection because of the "organization" problem.
I explained in my response that the model is a result of the the causal analysis process. By describing it as a result, I emphasize the process of finding the appropriate model (hence the model is part of the results). By presenting the model in the methods section, we give an impression that causal analysis is a simple application of another statistical procedure. The editor eventually decided a compromise. We presented more in the methods section and I added more in the results on the process of deriving the appropriate model.
In my opinion, the usual structure of Introduction -- Methods -- Results -- Discussion (IMRD) is not always effective. My experience showed that the structure can be a hindrance to the presentation of the process of doing research. In two recent papers I published in Environmental Science and Technology, I deviated from IMRD to include background information. When conducting research, we often have to change our approaches or abandon our initial hypothesis. After all, research is a learning process. The IMRD structure encourages us to avoid the discussion of the process. This approach can be hazardous in presenting statistical modeling.

Subscribe to:
Posts (Atom)