Monday, June 21, 2010

An example of statistical modeling -- part 1: model formulation

I will regularly post discussions on examples used in Environmental and Ecological Statistics with R (available at Amazon).

This post discuss the general thought process of the PCB in fish example.  In the first part, I go over the background in data and the process of establishing the model form.

The data were fish (lake trout) tissue concentrations of PCB measured from fish caught in Lake Michigan from 1970 to 2000.  Each year, a number of fish were sampled from the lake and PCB concentrations were measured using the edible portion of fish sample.   The main objective of the example is to study the trend of mean PCB fish tissue concentrations in the lake.

A modeling project always have the following three steps:

1. Model formulation

2. Parameter estimation, and

3. Model evaluation

A properly formulated model can be based on relevant theory or exploratory data analysis.  In this case, we start with a general model often used in environmental engineering.  The model suggests that the rate of change of a chemical concentration is often proportional to the concentration itself (the first order reaction model).  That is, the derivative of concentration with respect to time is proportional to concentration:

[;\frac{dC}{dt} = - k t;]

Solving this differential equation, we have

[;C_t = C_0 e^{-kt};]

where [;C_t;] is the concentration at time t, [;C_0;] is the concentration at time 0.  Taking logarithmic transformation on both sides:

[;\log(C_t) = \log(C_0) - k t;]

which suggests a log linear regression model with [;\log(C_t);] as the response variable and [;t;] as the predictor.  The intercept is [;\log(C_0);] and the slope is [;k;].

This process of establishing a linear regression model form through some basic mechanistic relations is common. A common feature of this approach is the aggregation/simplification of mechanistic relations from fine spatiotemporal scales into the scale represented by the data.  For example, the first order model is a reasonable model for a given fish.  But there is no possibility of measuring the PCB concentration from the same fish over time.  When using concentration data from multiple fish each year, the resulting relationship is about the over average PCB concentrations in all fish in Lake Michigan.  Also because of the simplification and aggregation, model coefficients may now vary due to other factors.  In this example, we will explore the changes of k as a function of fish size.

In summary, we used the first order reaction model as the basis to establish the basic form of the relationship between PCB fish tissue concentration and time (since PCB was banned from the area).

No comments:

Log or not log

LOGorNOTLOG.html Log or not log, that is the question May 19, 2018 In 2014 I taught a special topics class on statistical i...