Wednesday, March 2, 2011

The PCB in Fish Example: Multiple Regression -- Interaction

When fitting the multiple regression model with yr and len.c as the predictors, an important assumption is that the effect of year (the slope of year) is not affected by the size of the fish and the effect of fish size (the slope of length) is the same throughout the study period. This is the additive-effect assumption imposed on a multiple regression model. Is this assumption reasonable? Madenjian et al. [1998] reported that small lake trout ([;<40;] cm) eat small alewives (Alosa pseudoharengus, which have an average PCB concentration of 0.2 mg/kg), intermediate-size lake trout (40 ~ 60 cm) eat alewives and rainbow smelt (Osmerus mordax, whose PCB concentrations ranged from 0.2 to 0.45 mg/kg) and large lake trout (60 cm) eat large alewives (with an average PCB concentration of 0.6 mg/kg). On the one hand, because larger fish tend to consume food with higher concentrations of PCB, its reduction over time should be slower than the rate of reduction of small fish. On the other hand, because PCB was banned in the 1970s, the natural reduction of PCB through microbiological metabolism resulted in the overall reduction of PCB concentration in the environment and in fish. We expect that the PCB – length relationship will change over time. In other words, the slope of year in the multiple regression model is expected to change with the size of a fish and the slope of length is expected to change over time. To model this “interaction” effect, we add a third predictor, the product of yr and len.c in the model:

#### R code ####
lake.lm4 <- lm(log(pcb) ~ I(year-1974)*len.c, data=laketrout)
display(lake.lm4, 4)
 
#### R output ####
lm(formula = log(pcb) ~ I(year - 1974)*len.c, data = laketrout)
                     coef.est coef.se
(Intercept)           1.8967   0.0465
I(year - 1974)       -0.0873   0.0036
len.c                 0.0510   0.0038
len.c:I(year - 1974)  0.0008   0.0003
---
n = 631, k = 4
residual sd = 0.5520, R-Squared = 0.67

When the interaction term [;len.c:I(year - 1974);] is included, the model is expressed as:

[;\log(P CB) = 1.89 - 0.087yr + 0.051Len.c + 0.00085yr \cdot Len.c + \varepsilon;]

                                       (1)

Because of the product term, the model is no longer a linear model. The slopes of centered length (len.c) and year (yr) are no longer constant. We can rearrange the model to understand the interaction effect. First, the interaction term is grouped with yr:

[;\log(P CB) = 1.89 + (-0.087 + 0.00085Len.c)yr + 0.051Len.c + \varepsilon;]

That is, the effect (or slope) of [;yr;] is now a function of [;Len.c;]. The slope shown (-0.087) is the slope when [;Len.c = 0;] or the year effect for an average sized fish. When the fish size is 10 cm above average, the yr effect is -0.087 + 0.00085 10 = -0.0785. In other words, not only a larger fish has a higher PCB concentration on average, PCB in a larger fish tend to dissipate at a lower rate. This interpretation is true only when we are comparing same-sized fish over time. So, when comparing fish of the average length (Len.c = 0), the annual rate of dissipation is 8.7%. The annual dissipation rate is 7.6% for fish with a size 10 cm above average. When examining the log(PCB) fish length relationship, the model can be rearranged to be:

[;\log(PCB) = 1.89+(0.051+0.00085yr)Len.c-0.087yr+\varepsilon;]

The relationship is still linear for any given year. But the slope changes over time. Initially, (yr = 0 or 1974), the size effect is 0.051. Each unit (1 cm) increase in size will result in a 5.1% increase in PCB concentration.  Ten years later (1984), the slope was 0.051 + 0.00085 10 = 0.0595. The size effect is stronger. This is reasonable because the rate of concentration decreasing for a large fish is smaller than the rate for a small fish. Consequently, the difference in concentration between the same two fish increases over time.

The interaction effect is small (albeit statistically significant). Can this small interaction effect be practically significant? Because the response variable is in logarithmic scale, we need to be careful in interpreting a small effect. For the slope of yr, the slope value for a small fish (-6.7 cm below average, or the first quartile) is 0.09 - 0.00085 × (-6.7) = 0.095 and the slope is 0.09 - 0.0008 × (8.5) = 0.083 for a large fish (8.5 cm above average, the third quartile). PCB concentration reduction is at a lower rate (~ 8%) for a large fish and a higher rate (~ 10%) for small fish. The slope of len.c increases from 0.05 in 1974 to 0.074 in 2004, indicating a much larger difference in PCB concentration between a large and a small fish.

References


C.P. Madenjian, R.J. Hesselberg, T.J. Desorcie, L.J. Schmidt, Stedman. R.M., L.J. Begnoche, and D.R. Passino-Reader. Estimate of net trophic transfer efficiency of PCBs to Lake Michigan lake trout from their prey. Environmental Science and Technology, 32:886–891, 1998.




No comments:

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...