Monday, June 13, 2016

Hypothesis testing and the Raven Paradox

I was going over my old reading logs the other day and the saw my notes on the Raven paradox (a.k.a. Hempel's paradox).  The statement that "all ravens are black" is apparently straightforward and entirely true.  The logical contrapositive "everything that is not black is not a raven" is also obviously true and uncontroversial.  In mathematics, proof by contrapositive is a legitimate inference method.  That is, you can show "If not B then not A" to support "If A then B."  The raven paradox is apparently paradoxical because it suggests that observing a white shoe (I. J. Good) is evidence supporting the claim that all ravens are black.  I.J. Good proposed a Bayesian explanation (or solution) of the paradox. The weight of evidence provided by seeing a white shoe (or any none black object that is not a raven) is positive, but small if the number of raven is small compared to all non-black objects.   But how is the paradox relevant to statistical hypothesis testing?

Statistical Hypothesis Inference and Testing is relevant to discussing the Raven paradox because we show support to our theory (the alternative hypothesis) by showing that a non-white object is not a raven (the null hypothesis).  If we are interested in showing that a treatment has an effect, we start by setting the null hypothesis as the treatment of no effect.  Using statistics, we show that data do not support the null hypothesis; hence the logic of contrapositive leads to the conclusion that the treatment is effective.  I have no problem with this thought process, as long as we are only interested in a yes/no answer about the effectiveness of the treatment.  How effective is of no interest.  But if we are interested in quantifying the treatment effect, hypothesis testing is almost always not appropriate.  When we are interested in quantifying the effect, we are interested in a specific alternative.  For example, when discussing the effectiveness of agricultural conservation practices on reducing nutrient loss, we want to know the magnitude of the effect, not whether or not the effect exists.  Showing that the effect is not zero gives some support to the claim that the effect is X, but not much.  This is why we often advise our students that statistical significance is not always practically useful, especially when the null hypothesis itself is irrelevant to the hypothesis of interest (the alternative hypothesis).  
A "threshold" model known as TITAN is a perfect example of the Raven paradox.  
The basic building block of TITAN is a series of permutation tests. Although TITAN's authors never clearly stated the null and alternative hypothesis, it is not difficult to derive these hypotheses using the basic characteristics of a permutation test.  The hypothesis of interest (the alternative) is that changes of a taxon's abundance along an environmental gradient can be approximated by a threshold model (specifically, a step function model).  The null hypothesis is that the taxon's abundance is a constant along the same gradient.  We can rephrase the alternative hypothesis as: the pattern of change of a taxon's abundance is a threshold model.  The null hypothesis is that the pattern of change is flat.  When we reject the null, we say that the pattern of change is not flat.  The rejection can be seen as evidence supporting the alternative, but the weight of evidence is small if the number of non-flat and non-threshold patterns of change is large.

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...