Friday, December 18, 2015

Explaining Science to the Media

The memory of the 2014 water crisis is still fresh in the minds of many people in the Toledo, OH area.  Anything related to harmful algal bloom in Lake Erie will make to the news one way or the other.  Yesterday, I was asked to explain the research published in this paper to two local TV channels.  In my mind, the goal of the work was to use better statistical method to reduce measurement uncertainty.  To introduce the work without touching the term Bayesian statistics, I talked about how decision under uncertainty often result in the lack of confidence in the final choice. The lack of confidence is often the reason for less effective communication between the decision maker and the public.   In this case, uncertainty made explaining the "Do Not Drink" order very difficult.   The difficulty, in turn, led to the lack of communication between the city and the public, resulting in public anxiety and second guessing about the order later.  When implemented, our method can result in a more confident decision, thereby, help the city to better communicate with the public.  The final cuts from both TV stations present the problem with a single question, did the City of Toledo make the right call in issuing the "Do Not Drink" order?  They effectively conveyed my message without mentioning words like "risk communication" and the reasoning behind my answer to the question.  Our training in scientific writing does not help us in explaining science to the public.  I have a lot to learn from reporters.  By the way, the news cast seems to show that I know how to operate the ELISA test.  This was, in fact, the first time I touched an ELISA kit.

Tuesday, November 24, 2015

Uncertainty in measured microcystin concentrations using ELISA

We published a paper on the uncertainty in the measured microcystin concentration using the commonly used method known as ELISA.  Microcystin is a group of toxin associated with blooms of cyanobacteria.  One high concentration value detected in a drinking water sample in Toledo in 2014 resulted in a "do not drink" advisor that affected about half a million people in the Toledo area.  In the paper we discussed the high level of uncertainty associated with the estimated concentrations and provided a Bayesian Hierarchical Modeling (BHM) approach for reducing the uncertainty.  I have uploaded data used in that paper to GitHub.

Thursday, October 29, 2015

Results or Methods

In a recent paper, I and my co-author discussed the use of statistical causal analysis (propensity score matching) for analyzing observational data. The example is to estimate the effect of water and soil conservation practices on controlling nutrient loss from farm fields. The data were observational, a collection of measured P and N loss from various field level studies. My interest on the subject started in 2009 after I completed this book, thinking about whether the topic of causal analysis should be included in the future. When a colleague shared a dataset collected by USDA, I decided to study causal analysis and use it in my class. The effect of conservation practices on nutrient loss has been a topic of agricultural studies for a long time. But the method used in various studies is inevitably modeling. But these models were never properly calibrated and the basic input to these model is the nutrient yield. In many cases, researchers simply assume a fixed rate of reduction in nutrient yield and plug the rate into the model to calculate the total reduction. I have not seen a study properly document the effect of conservation practices using a randomized experiment. After discussing with colleagues, I realized that a randomized experiment is practically impossible. However, the decision of implementing conservation practices is often based on whether a field is prone to soil and water loss. As a result, if we combine data from many studies and compare the nutrient loss from fields with and without conservation practices, we often find that fields with conservation practices have larger nutrient losses. But we are often comparing fields with row crops (with conservation practices) to pastures (without conservation practices). In one published paper, the authors were puzzled by the result of such a comparison. I worked on a dataset consisting of measurements from about 160 papers with field scale measurements on nutrient loss, fertilizer application rate and methods, and other routinely measured variables (crops, best management practices, conservation practices). Using the dataset as an example, I taught statistical causal analysis in 2011, 2013, and 2014 in my graduate level statistics classes. I wrote the paper based on how students responded to the materials. I found that the concept of confounding factor is often new to students, which is not surprising as most students don't have a good conceptual understanding of statistics. I included a long background subsection in the Introduction. From a statistical perspective, the methods section on the propensity score matching method can be described in general terms. The results, in my mind, include not only the estimated effects, but also additional factors to be controlled. In other words, the derived statistical model and its interpretation should be part of the results section. One reviewer and the associate editor insisted that we move the description of the model to methods section. They recommended rejection because of the "organization" problem. I explained in my response that the model is a result of the the causal analysis process. By describing it as a result, I emphasize the process of finding the appropriate model (hence the model is part of the results). By presenting the model in the methods section, we give an impression that causal analysis is a simple application of another statistical procedure. The editor eventually decided a compromise. We presented more in the methods section and I added more in the results on the process of deriving the appropriate model. In my opinion, the usual structure of Introduction -- Methods -- Results -- Discussion (IMRD) is not always effective. My experience showed that the structure can be a hindrance to the presentation of the process of doing research. In two recent papers I published in Environmental Science and Technology, I deviated from IMRD to include background information. When conducting research, we often have to change our approaches or abandon our initial hypothesis. After all, research is a learning process. The IMRD structure encourages us to avoid the discussion of the process. This approach can be hazardous in presenting statistical modeling.

Thursday, May 14, 2015

Some Simple Statistics in Clean Water Act Compliance Assessment

Some Simple Statistics in Clean Water Act Compliance Assessment
<!-- dynamically load mathjax for compatibility with self-contained

-->

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...