Song's Blog: Stan versus JAGS

Stan was born on August 30, 2012. It was built to speed up multilevel generalized linear models. Instead of improving the sampler used in BUGS, Stan used a better sampler -- the Hamiltonian Monte Carlo. The HMC converges faster. I tried Stan in the last two weeks and did a small comparison using a change point model. The model was discussed in Environmental and Ecological Statistics with R (section 6.1.2). In the book, lilacs first bloom dates from 4 sites were used to fit 4 hockey stick models, one for each site. I pooled data from the 4 sites and formed a simple multilevel model, where the 4 model parameters were assumed exchangeable. For example:

[; \beta_{0j} \sim N(\mu_0, \sigma_0) ;]

Implementing the model in BUGS (or JAGS) is fairly straight forward:

## multilevel hockey stick model
##
model{
for (i in 1:n){
y[i] ~ dnorm(y.hat[i], tau.y[stid[i]])
y.hat[i] <- beta0[stid[i]]+(beta1[stid[i]]-delta[stid[i]]*step(x[i]-phi[stid[i]]))*(x[i]-phi[stid[i]])
}
for (j in 1:n.st){
beta0[j] ~ dnorm(mu0, tau[1])
beta1[j] ~ dnorm(mu1, tau[2])
delta[j] ~ dnorm(mu2, tau[3])
phi[j] ~ dnorm(mu3,tau[4])T(L[j],U[j])
tau.y[j] <- pow(sigma.y[j], -2)
sigma.y[j] ~ dunif(0, 10)
}
mu0 ~ dnorm(0, 0.0001)
mu1 ~ dnorm(0, 0.0001)
mu2 ~ dnorm(0, 0.0001)
mu3 ~ dnorm(0, 0.0001)

for (k in 1:4){
tau[k] <- pow(sigma[k], -2)
sigma[k] ~ dunif(0, 100)
}
}

But the model converges very slowly. After a warm-up of 100,000 iterations, the next 100,000 iterations (thin=150) are still jumpy.

The Stan code looks like this:

/* multilevel hockey stick model without group predictors */

data{

int n;

int nst;

real y[n];

int stid[n]; // station ID

real x[n]; // standardized year

real minYr;

real maxYr;

}

parameters{

real beta0[nst]; // intercept at change point

real beta1[nst];

real delta[nst];

real phi[nst];

real mu0;

real mu1;

real mu2;

real mu3;

real sigmay;

real sigma[4];

}

model{

mu0 ~ normal(0, 100);

mu1 ~ normal(0, 100);

mu2 ~ normal(0, 100);

mu3 ~ normal(0, 100);

beta0 ~ normal(mu0, sigma[1]);// vector

beta1 ~ normal(mu1, sigma[2]);// vector

phi ~ normal(mu2,sigma[3]); // vector

delta ~ normal(mu3,sigma[4]); // vector

for (i in 1:n){

y[i]~normal(beta0[stid[i]]+(beta1[stid[i]] - delta[stid[i]] * int_step(x[i]-phi[stid[i]]))*(x[i]-phi[stid[i]]), sigmay);

}

Two big differences between Stan and BUGS/JAGS: one is that we need to declare data and parameters, and the second is that the order of the code matters. The first part is easy to adjust as we need to supply data and I always provide initials for all parameters. The second is somewhat hard for me as I always write BUGS code (no ordering is needed) in the order of "likelihood -- mean function -- prior" and I must reverse the order in Stan. But this is a small price to pay for a big reward in terms of speed of convergence and the possibility of doing something more.

The Stan model output is somewhat different from the hockey stick model output. For example, the estimated change point is 1975(3.5), 1983(4.9), 1974(8), and 1976(6.7) in Table 6.1 of the book, and the multilevel model estimates are 1976, 1976, 1974, and 1978 (with a more or less consistent sd of 6.6). The multilevel estimated slopes of the first line segment ([;\beta_1;]) are much closer to 0 than the same from the hockey stick model. The R package rstan is also very easy to use:

bugs.in <- function(infile=USlilac, n.chains=3){

y <- infile$FirstBloom

temp <- !is.na(y) ## Stan cannot take NA in y

infile <- infile[temp,]

y <- y[temp]

xx <- infile$Year

x <- (xx-mean(xx))/sd(xx)

stid <- as.numeric(ordered(infile$STID))

n <- length(y)

n.st <- max(stid)

L <- tapply(x, stid, min)

U <- tapply(x, stid, max)

inits <- list()

bugs.dat <- list(n=n, nst=n.st, y=y, x=x, stid=stid,

minYr=min(L), maxYr=max(U))

for (i in 1:n.chains)

inits[[i]] <- list(beta0=rnorm(n.st,130,2),

beta1=rnorm(n.st, 0, 0.1),

delta=runif(1, 0, 3),

phi=runif(n.st, max(L), min(U)),

sigmay=runif(1, 0, 3),

sigma=runif(2, 0, 1),

mu0 = rnorm(1), mu1 = rnorm(1))

parameters <- c("beta0","beta1","delta", "phi", "mu0",

"mu1","mu2","mu3","sigma", "sigmay")

return(list(para=parameters, data=bugs.dat, inits=inits,

meanYR=mean(xx), sdYR=sd(xx), n.chains=n.chains,

model="

/* multilevel hockey stick model */

data{

int n;

int nst;

real y[n];

int stid[n]; // station ID

real x[n]; // standardized year

real minYr;

real maxYr;

}

parameters{

real beta0[nst]; // intercept at change point

real beta1[nst];

real delta[nst];

real phi[nst];

real mu0;

real mu1;

real mu2;

real mu3;

real sigmay;

real sigma[4];

}

model{

mu0 ~ normal(0, 100);

mu1 ~ normal(0, 100);

mu2 ~ normal(0, 100);

mu3 ~ normal(0, 100);

beta0 ~ normal(mu0, sigma[1]);

beta1 ~ normal(mu1, sigma[2]);

phi ~ normal(mu2,sigma[3]);

delta ~ normal(mu3,sigma[4]);

for (i in 1:n){

y[i]~normal(beta0[stid[i]]+

(beta1[stid[i]] - delta[stid[i]] * int_step(x[i]-phi[stid[i]]))*(x[i]-phi[stid[i]]), sigmay);

}

))

}

keep4 <- USlilac$STID==354147 | USlilac$STID==456974 |

USlilac$STID==426357 | USlilac$STID==456624

input.to.bugs <- bugs.in(infile=USlilac[keep4,])

n.chains <- input.to.bugs$n.chains

n.iter <- 100000

n.keep <- 2000

thin <- max(c(1,floor((n.iter)*n.chains/n.keep)))

fit <- stan(model_code=input.to.bugs$model,

data=input.to.bugs$data, init=input.to.bugs$inits,

pars=input.to.bugs$para,

iter=n.iter, thin=thin, chains=n.chains)

print(fit)

quartz()

plot(fit)

My experience with Stan is a pleasant one, especially the level of help provided by the Stan team. The last question I submitted to them was at 10:30 pm and I got an answer back around 11:00 pm the same day!

Song's Blog

Wednesday, October 3, 2012

Stan versus JAGS

No comments:

Log or not log

Search This Blog