Chapter 09
Markov Chain Monte Carlo
Material
Introduction
These are answers and solutions to the exercises at the end of chapter 9 in Satistical Rethinking 2 by Richard McElreath. I have created these notes as a part of my ongoing involvement in the AU Bayes Study Group. Much of my inspiration for these solutions, where necessary, has been obtained from Taras Svirskyi, William Wolf, and Corrie Bartelheimer as well as the solutions provided to instructors by Richard McElreath himself.
R
Environment
For today’s exercise, I load the following packages:
library(rethinking)
library(rstan)
library(ggplot2)
library(tidybayes)
Easy Exercises
Practice E1
Question: Which of the following is a requirement of the simple Metropolis algorithm?
- The parameters must be discrete.
- The likelihood function must be Gaussian.
- The proposal distribution must be symmetric.
Answer:
- Not a requirement. Metropolis can accommodate continuous and discrete parameters.
- Not a requirement. Distribution could be any symmetric distribution. Not just Gaussian.
- This is a requirement.
Practice E2
Question: Gibbs sampling is more efficient than the Metropolis algorithm. How does it achieve this extra efficiency? Are there any limitations to the Gibbs sampling strategy?
Answer: Gibbs uses adaptive proposals when considering which location in the posterior to sample next. This makes it more efficient because less proposed steps are rejected.
Practice E3
Question: Which sort of parameters can Hamiltonian Monte Carlo not handle? Can you explain why?
Answer: Discrete parameters. HMC depends on gradients which to explore using a physics simulation. Discrete parameters would not allow for the construction of any gradients.
Practice E4
Question: Explain the difference between the effective number of samples, n_eff
as calculated by Stan, and the actual number of samples.
Answer: Effective sample number (n_eff
) identifies the number of ‘ideal’ (i.e. uncorrelated) samples. Since MCMC algorithms explore the posterior as a chain of samples, each sample is usually correlated with the previous one to some extent. Conclusively, n_eff
identifies the number of samples used for estimating the posterior mean/distribution whereas actual number of samples is simply the number of data points we have.
n_eff
is usually smaller than the actual number of samples (unless we have anti-correlated MCMC samples).
Practice E5
Question: Which value should Rhat
approach, when a chain is sampling the posterior distribution correctly?
Answer: $\hat{R}$ or Rhat
, in R
, reflects variance within a chain versus variance between chains. If these are the same, $\hat{R}$ will be $1.0$ - i.e.: it does not matter from which chain we would infere parameters and predictions. Values higher than 1.0 can indicate problems in the model. Values much higher than 1 indicate serious issues.
Practice E6
Question: Sketch a good trace plot for a Markov chain, one that is effectively sampling from the posterior distribution. What is good about its shape? Then sketch a trace plot for a malfunctioning Markov chain. What about its shape indicates malfunction?
Answer:
Good trace plot
y <- rnorm(1e4, mean = 1, sd = 2)
m.E6Good <- ulam(
alist(
y ~ dnorm(mu, sigma),
mu <- alpha,
alpha ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1)
),
data = list(y = y),
cores = 2,
chains = 2,
start = list(
alpha = 0,
sigma = 1
)
)
traceplot(m.E6Good)
## [1] 1000
## [1] 1
## [1] 1000
These trace plots show that the chains quickly find the region with highest posterior probability and stay there.
Bad trace plot
y <- rnorm(1e4, mean = 1, sd = 2)
m.E6Bad <- ulam(
alist(
y ~ dnorm(mu, sigma),
mu <- a1 + a2,
a1 ~ dnorm(0, 10),
a2 ~ dnorm(0, 10),
sigma ~ dcauchy(0, 1)
),
data = list(y = y),
chains = 2,
cores = 2,
start = list(
a1 = 0,
a2 = 0,
sigma = 1
),
)
traceplot(m.E6Bad)
## [1] 1000
## [1] 1
## [1] 1000
This is a problem of unidentifiable parameters as a1
and a2
can cancel each other out to arrive at the correct mu
and so we see non-stationary behaviour in the trace plots of a1
and a2
while the trace plot for sigma
is doing alright.
Medium Exercises
Practice M1
Question: Re-estimate the terrain ruggedness model from the chapter, but now using a uniform prior and an exponential prior for the standard deviation, sigma
. The uniform prior should be dunif(0,10)
and the exponential should be dexp(1)
. Do the different priors have any detectable influence on the posterior distribution?
Answer: The ruggedness model in question is m8.3
in the book (or m9.1
in ulam()
specification). First, I prepare the data like I did
previously.
data(rugged)
d <- rugged
d$log_gdp <- log(d$rgdppc_2000)
d <- d[complete.cases(d$rgdppc_2000), ]
d$log_gdp_std <- d$log_gdp / mean(d$log_gdp)
d$rugged_std <- d$rugged / max(d$rugged)
d$cid <- ifelse(d$cont_africa == 1, 1, 2)
dd.trim <- list(
log_gdp_std = d$log_gdp_std,
rugged_std = d$rugged_std,
cid = as.integer(d$cid)
)
Let’s fit that model with the different priors:
## Exponential prior for sigma
m.M1Exp <- ulam(
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * (rugged_std - 0.215),
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dexp(1)
),
data = dd.trim,
chains = 4,
cores = 4,
)
## Uniform prior for sigma
m.M1Uni <- ulam(
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * (rugged_std - 0.215),
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dnorm(0, 10)
),
data = dd.trim,
chains = 4,
cores = 4,
)
Now on to inspect the model. Let’s start with the parameter estimates in comparison
coeftab(m.M1Exp, m.M1Uni)
## m.M1Exp m.M1Uni
## a[1] 0.89 0.89
## a[2] 1.05 1.05
## b[1] 0.13 0.13
## b[2] -0.14 -0.14
## sigma 0.11 0.11
## nobs 170 170
These are strikingly the same. What about the individual model outputs in more detail?
precis(m.M1Exp, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] 0.8870817 0.015625699 0.86196179 0.91173540 2453.919 0.9995577
## a[2] 1.0507770 0.009968219 1.03527611 1.06640703 2834.441 0.9988734
## b[1] 0.1344067 0.074307822 0.01486287 0.25218389 2786.188 0.9993677
## b[2] -0.1413442 0.054855132 -0.22964887 -0.05187494 2324.832 0.9983652
## sigma 0.1117154 0.006171670 0.10228974 0.12208002 2725.266 0.9988256
precis(m.M1Uni, depth = 2)
## mean sd 5.5% 94.5% n_eff Rhat4
## a[1] 0.8865936 0.015580736 0.86128553 0.91082334 2489.074 0.9989360
## a[2] 1.0501777 0.010010613 1.03404118 1.06614541 2152.883 1.0007549
## b[1] 0.1312147 0.074609926 0.01239339 0.24998421 2244.528 0.9993558
## b[2] -0.1420136 0.054996077 -0.22957192 -0.05372842 2023.621 0.9987402
## sigma 0.1115782 0.006224722 0.10188964 0.12166315 3600.101 0.9990594
Again, these are very similar aside from the effective number of samples (n_eff
) which is much higher for all parameter estimates in the model with the exponential prior on sigma
(m.M1Exp
) except for sigma
itself, which boasts a higher n_eff
in the uniform-prior model (m.M1Uni
). As such, we conclude that while the different priors have an impact on n_eff
, they do not change the posterior distributions. Let me visualise this:
Plot_df <- data.frame(
Posteriors = c(
extract.samples(m.M1Exp, n = 1e4)$sigma,
extract.samples(m.M1Uni, n = 1e4)$sigma
),
Name = rep(c("Exp", "Uni"), each = 1e4),
Model = rep(c("m.M1Exp", "m.M1Uni"), each = 1e4)
)
ggplot(Plot_df, aes(y = Model, x = Posteriors)) +
stat_halfeye() +
labs(x = "Parameter Estimate", y = "Model") +
theme_bw()
That really does look the same to me.
Practice M2
Question: The Cauchy and exponential priors from the terrain ruggedness model are very weak. They can be made more informative by reducing their scale. Compare the dcauchy
and dexp
priors for progressively smaller values of the scaling parameter. As these priors become stronger, how does each influence the posterior distribution?
Answer: I write a for
loop here to minimise code needs:
RepTimes <- 4 # how many steps I want to try
ScalingFactor <- 10 # by what factor to make priors stronger
# empty lists to store models in
Explist <- as.list(rep(NA, RepTimes))
Caulist <- as.list(rep(NA, RepTimes))
# Loop over all models
for (Mod_Iter in 0:(RepTimes - 1)) {
dd.trim$ScalingFactor <- ScalingFactor
dd.trim$Mod_Iter <- Mod_Iter
## Exponential prior for sigma
m.M2Exp <- ulam(
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * (rugged_std - 0.215),
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dexp(1 * ScalingFactor^Mod_Iter)
),
data = dd.trim,
chains = 4,
cores = 4,
)
Explist[[Mod_Iter + 1]] <- m.M2Exp
## Cauchy prior for sigma
m.M2Cau <- ulam(
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * (rugged_std - 0.215),
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dcauchy(0, 1 / ScalingFactor^Mod_Iter)
),
data = dd.trim,
chains = 4,
cores = 4,
)
Caulist[[Mod_Iter + 1]] <- m.M2Cau
}
coeftab(Explist[[1]], Explist[[2]], Explist[[3]], Explist[[4]])
## Explist[[1]] Explist[[2]] Explist[[3]] Explist[[4]]
## a[1] 0.89 0.89 0.89 0.89
## a[2] 1.05 1.05 1.05 1.05
## b[1] 0.13 0.13 0.13 0.13
## b[2] -0.14 -0.14 -0.14 -0.15
## sigma 0.11 0.11 0.11 0.09
## nobs 170 170 170 170
coeftab(Caulist[[1]], Caulist[[2]], Caulist[[3]], Caulist[[4]])
## Caulist[[1]] Caulist[[2]] Caulist[[3]] Caulist[[4]]
## a[1] 0.89 0.89 0.89 0.89
## a[2] 1.05 1.05 1.05 1.05
## b[1] 0.14 0.13 0.13 0.13
## b[2] -0.14 -0.14 -0.14 -0.14
## sigma 0.11 0.11 0.11 0.11
## nobs 170 170 170 170
The more restrictive exponential priors decrease the estimate for sigma. On the other hand, the more restrictive cauchy priors have no effect, it seems.
Let’s explore why this is by looking at the priors themselves:
par(mfrow = c(1, 2))
curve(dexp(x, 1),
from = 0, to = 5, ylab = "Density", xlab = "sigma",
col = "royalblue4"
)
curve(dexp(x, 10), from = 0, to = 5, add = T)
curve(dexp(x, 100), from = 0, to = 5, add = T, col = col.desat("red"))
curve(dexp(x, 1000), from = 0, to = 5, add = T, col = col.desat("green"))
mtext("Exponential Prior")
legend("topright",
col = c("royalblue4", "black", col.desat("red"), col.desat("green")),
lty = c(1, 1, 1), legend = c("Exp(1)", "Exp(10)", "Exp(100)", "Exp(1000)"), bty = "n"
)
curve(2 * dcauchy(x, 0, 1),
from = 0, to = 5, ylab = "Density", xlab = "sigma",
col = "royalblue4"
)
curve(2 * dcauchy(x, 0, 0.1), from = 0, to = 5, add = T, col = "black")
curve(2 * dcauchy(x, 0, 0.01), from = 0, to = 5, add = T, col = col.desat("red"))
curve(2 * dcauchy(x, 0, 0.001), from = 0, to = 5, add = T, col = col.desat("green"))
mtext("Cauchy Prior")
legend("topright",
col = c("royalblue4", "black", col.desat("red"), col.desat("green")),
lty = c(1, 1, 1), legend = c("Cauchy(0, 1)", "Cauchy(0, 0.1)", "Cauchy(0, 0.01)", "Cauchy(0, 0.001)"), bty = "n"
)
The cauchy distributions show thicker tails while the exponential distributions quickly concentrate. Hence why a concentrated Cauchy prior allow more flexibility that a concentrated exponential prior.
Practice M3
Question: Re-estimate one of the Stan models from the chapter, but at different numbers of warmup
iterations. Be sure to use the same number of sampling iterations in each case. Compare the n_eff
values.
Answer: The ruggedness model was fine so far so I continue with that one. Here, I build this model with a fixed run length and fixed starting values for each run with changing warmup values:
start <- list(a = c(1, 1), b = c(0, 0), sigma = 1) # use fixed start values for comparability of runs
m.M3 <- ulam(
alist(
log_gdp_std ~ dnorm(mu, sigma),
mu <- a[cid] + b[cid] * (rugged_std - 0.215),
a[cid] ~ dnorm(1, 0.1),
b[cid] ~ dnorm(0, 0.3),
sigma ~ dexp(1)
),
data = dd.trim,
start = start,
chains = 2, cores = 2,
iter = 100
)
warm_list <- c(5, 10, 100, 500, 1000) # define warmup values to run through
n_eff <- matrix(NA, nrow = length(warm_list), ncol = 5) # first make matrix to hold n_eff results
for (i in 1:length(warm_list)) { # loop over warm_list and collect n_eff
w <- warm_list[i]
m_temp <- ulam(m.M3, chains = 2, cores = 2, iter = 1000 + w, warmup = w, start = start)
n_eff[i, ] <- precis(m_temp, 2)$n_eff
}
colnames(n_eff) <- rownames(precis(m_temp, 2))
rownames(n_eff) <- warm_list
n_eff # columns show parameters, rows show n_eff
## a[1] a[2] b[1] b[2] sigma
## 5 2.314186 1.587251 2.713325 1.270369 1.776862
## 10 2243.084776 2157.086156 737.957589 1010.214712 953.010860
## 100 1725.334719 2294.576251 878.481785 1177.016946 1122.495229
## 500 2999.738299 3282.963810 2292.173710 2737.037252 2200.949134
## 1000 2485.029304 3406.341675 2372.274092 2772.175825 2607.552453
As we can see, past just 10 warmup samples, n_eff
does not change much (in terms of how useful our samples are). In this case, we could be quite happy with a warmup of 10.
Hard Exercises
Practice H1
Question: Run the model below and then inspect the posterior distribution and explain what it is accomplishing.
mp <- map2stan(
alist(
a ~ dnorm(0, 1),
b ~ dcauchy(0, 1)
),
data = list(y = 1),
start = list(a = 0, b = 0),
iter = 1e4,
chains = 2, cores = 2,
warmup = 100,
WAIC = FALSE
)
Compare the samples for the parameters a
and b
. Can you explain the different trace plots, using what you know about the Cauchy distribution?
Answer: First of all, let’s inspect the posterior:
precis(mp)
## mean sd 5.5% 94.5% n_eff Rhat4
## a 0.0003388167 0.9988213 -1.601441 1.590561 12762.761 1.000120
## b -0.1918181852 13.8995715 -5.379742 5.346423 3892.011 1.000517
Oof. Those uncertainties don’t look good at all! So what does the model even do? It simply just samples a
from a normal distribution with mean 0 and standard deviation 1. b
is sampled from a cauchy distribution. Let’s look at the traceplot for this:
plot(mp, n_cols = 1, col = "royalblue4")
As we can see, there are quite some outliers in the sampling of the cauchy distribution (b
). Why is that? Because the cauchy distribution has very heavy tails thus making it more likely to jump to a value that is far out there in terms of posterior probability. Note that this also decreases n_eff
. lp
in the above is the log-posterior.
Now let’s see how the samples we drew measure up against the underlying functions of a
and b
, respectively:
post <- extract.samples(mp)
par(mfrow = c(1, 2))
dens(post$a)
curve(dnorm(x, 0, 1), from = -4, to = 4, add = T, lty = 2)
legend("topright", lty = c(1, 2), legend = c("Sample", "Exact density"), bty = "n")
mtext("Normal")
dens(post$b, col = "royalblue4", xlim = c(-10, 10))
curve(dcauchy(x, 0, 1),
from = -10, to = 10, add = T, lty = 2,
col = "royalblue4"
)
mtext("Cauchy")
As we can see, the normal distribution has been reconstructed well. The cauchy distributions hasn’t.
Practice H2
Question: Recall the divorce rate example from Chapter 5. Repeat that analysis, using ulam()
this time, fitting models m5.1
, m5.2
, and m5.3
. Use compare to compare the models on the basis of WAIC or PSIS. Explain the results.
Answer: First, I need to load the data and prepare it for ulam()
:
data(WaffleDivorce)
d <- WaffleDivorce
d$D <- standardize(d$Divorce)
d$M <- standardize(d$Marriage)
d$A <- standardize(d$MedianAgeMarriage)
d_trim <- list(D = d$D, M = d$M, A = d$A)
Now I fit the models with ulam()
:
m5.1_stan <- ulam(
alist(
D ~ dnorm(mu, sigma),
mu <- a + bA * A,
a ~ dnorm(0, 0.2),
bA ~ dnorm(0, 0.5),
sigma ~ dexp(1)
),
data = d_trim,
chains = 4, cores = 4,
log_lik = TRUE # this is needed to get the terms for calculating PSIS or WAIC
)
m5.2_stan <- ulam(
alist(
D ~ dnorm(mu, sigma),
mu <- a + bM * M,
a ~ dnorm(0, 0.2),
bM ~ dnorm(0, 0.5),
sigma ~ dexp(1)
),
data = d_trim,
chains = 4, cores = 4,
log_lik = TRUE # this is needed to get the terms for calculating PSIS or WAIC
)
m5.3_stan <- ulam(
alist(
D ~ dnorm(mu, sigma),
mu <- a + bA * A + bM * M,
a ~ dnorm(0, 0.2),
bA ~ dnorm(0, 0.5),
bM ~ dnorm(0, 0.5),
sigma ~ dexp(1)
),
data = d_trim,
chains = 4, cores = 4,
log_lik = TRUE # this is needed to get the terms for calculating PSIS or WAIC
)
Now we compare the models:
compare(m5.1_stan, m5.2_stan, m5.3_stan, func = PSIS)
## PSIS SE dPSIS dSE pPSIS weight
## m5.1_stan 125.7210 12.708327 0.000000 NA 3.630705 0.7253039155
## m5.3_stan 127.6690 12.852350 1.947996 0.6705316 4.773054 0.2738533387
## m5.2_stan 139.2364 9.936093 13.515361 9.1363047 2.923975 0.0008427459
compare(m5.1_stan, m5.2_stan, m5.3_stan, func = WAIC)
## WAIC SE dWAIC dSE pWAIC weight
## m5.1_stan 125.7778 12.641919 0.000000 NA 3.659072 0.6960655494
## m5.3_stan 127.4407 12.591741 1.662916 0.6770545 4.658881 0.3030766321
## m5.2_stan 139.1754 9.813604 13.397613 9.2109285 2.893468 0.0008578185
WAIC tells a similar story as PSIS, but the model only containing age (m5.1_stan
) wins. The model with both predictors (m5.3_stan
) does almost as well. However, their respective PSIS and WAIC values are nearly identical. Furthermore, both models get assigned all of the WAIC weight. Let’s call these equal in performance and investigate why:
precis(m5.3_stan)
## mean sd 5.5% 94.5% n_eff Rhat4
## a -0.0001904293 0.10140928 -0.1591984 0.1619373 1877.251 1.0002239
## bA -0.6023698429 0.16025804 -0.8510854 -0.3467602 1085.019 1.0007578
## bM -0.0550634908 0.16034205 -0.3109204 0.2015101 1187.780 0.9998155
## sigma 0.8275838910 0.08826874 0.7028130 0.9779113 1474.265 1.0028212
While m5.3_stan
contains the marriage predictor, it is very unsure of it’s influence. In practical terms, this means that m5.1_stan
and m5.3_stan
make basically the same predictions
Practice H3
Question: Sometimes changing a prior for one parameter has unanticipated effects on other parameters. This is because when a parameter is highly correlated with another parameter in the posterior, the prior influences both parameters. Here’s an example to work and think through.
Go back to the leg length example in Chapter 5. Here is the code again, which simulates height and leg lengths for 100 imagined individuals:
N <- 100 # number of individuals
height <- rnorm(N, 10, 2) # sim total height of each
leg_prop <- runif(N, 0.4, 0.5) # leg as proportion of height
leg_left <- leg_prop * height + rnorm(N, 0, 0.02) # sim left leg as proportion + error
leg_right <- leg_prop * height + rnorm(N, 0, 0.02) # sim right leg as proportion + error
d <- data.frame(height, leg_left, leg_right) # combine into data frame
And below is the model you fit before, resulting in a highly correlated posterior for the two beta parameters. This time, fit the model using ulam()
:
m5.8s <- ulam(
alist(
height ~ dnorm(mu, sigma),
mu <- a + bl * leg_left + br * leg_right,
a ~ dnorm(10, 100),
bl ~ dnorm(2, 10),
br ~ dnorm(2, 10),
sigma ~ dexp(1)
),
data = d,
chains = 4,
cores = 4,
start = list(
a = 10,
bl = 0,
br = 0.1,
sigma = 1
)
)
Compare the posterior distribution produced by the code above to the posterior distribution produced when you change the prior for br
so that it is strictly positive:
m5.8s2 <- ulam(
alist(
height ~ dnorm(mu, sigma),
mu <- a + bl * leg_left + br * leg_right,
a ~ dnorm(10, 100),
bl ~ dnorm(2, 10),
br ~ dnorm(2, 10),
sigma ~ dexp(1)
),
data = d,
chains = 4,
cores = 4,
constraints = list(br = "lower=0"),
start = list(
a = 10,
bl = 0,
br = 0.1,
sigma = 1
)
)
Note the constraints list. What this does is constrain the prior distribution of br
so that it has positive probability only above zero. In other words, that prior ensures that the posterior distribution for br
will have no probability mass below zero.
Compare the two posterior distributions for m5.8s
and m5.8s2
. What has changed in the posterior distribution of both beta parameters? Can you explain the change induced by the change in prior?
Answer: It’s probably easiest to just look at the posterior distributions of the beta prameters through the pairs()
function:
pairs(m5.8s, main = "Model 1")
pairs(m5.8s2, main = "Model 2")
As we can see, the beta distributions have shifted drastically between the different models. Interestingly, bl
and br
were perfectly symmetric in m5.8s
, but are skewed in m5.8s2
. Given how the height of a person is approximated in both models (a + bl*leg_left + br*leg_right
), the distributions of leg lengths are necessarily negatively correlated (you can be of the same height with a short right leg and long left leg, long left leg and short right leg, or two medium-length legs). Thus, by setting br
to be strictly positive in m5.8s2
and made it skewed, we have forced bl
to be equally skewed in a mirror image of br
.
Practice H4
Question: For the two models fit in the previous problem, use WAIC or PSIS to compare the effective numbers of parameters for each model. You will need to use log_lik=TRUE
to instruct ulam()
to compute the terms that both WAIC and PSIS need. Which model has more effective parameters? Why?
Answer: Let’s run the models:
m.H4_1 <- ulam(
alist(
height ~ dnorm(mu, sigma),
mu <- a + bl * leg_left + br * leg_right,
a ~ dnorm(10, 100),
bl ~ dnorm(2, 10),
br ~ dnorm(2, 10),
sigma ~ dexp(1)
),
data = d,
chains = 4,
cores = 4,
start = list(
a = 10,
bl = 0,
br = 0.1,
sigma = 1
),
log_lik = TRUE
)
m.H4_2 <- ulam(
alist(
height ~ dnorm(mu, sigma),
mu <- a + bl * leg_left + br * leg_right,
a ~ dnorm(10, 100),
bl ~ dnorm(2, 10),
br ~ dnorm(2, 10),
sigma ~ dexp(1)
),
data = d,
chains = 4,
cores = 4,
constraints = list(br = "lower=0"),
start = list(
a = 10,
bl = 0,
br = 0.1,
sigma = 1
),
log_lik = TRUE
)
Now we compare them with WAIC:
compare(m.H4_1, m.H4_2)
## WAIC SE dWAIC dSE pWAIC weight
## m.H4_1 182.1474 10.21060 0.0000000 NA 2.961292 0.6063273
## m.H4_2 183.0112 9.88398 0.8638001 2.349502 2.382919 0.3936727
The models are pretty much tied. The model with truncated priors (m.H4_2
) is less flexible as indicated by pWAIC
. This is because the prior is more informative and the variance in the posterior distribution is smaller as a result.
Practice H5
Question: Modify the Metropolis algorithm code from the chapter to handle the case that the island populations have a different distribution than the island labels. This means the island’s number will not be the same as its population.
Answer: First of all, we need our 10 islands with population sizes of 1-10, but in random order:
pop_size <- sample(1:10)
Now we can use the code from the chapter almost unaltered safe for one exception - we need to use indexing to translate island location into population size:
num_weeks <- 1e5
positions <- rep(NA, num_weeks)
current <- 10
for (i in 1:num_weeks) {
positions[i] <- current # record current position
proposal <- current + sample(c(-1, 1), size = 1) # flip coin to generate proposal
# now make sure he loops around the archipelago
if (proposal < 1) proposal <- 10
if (proposal > 10) proposal <- 1
prob_move <- pop_size[proposal] / pop_size[current] # move?
current <- ifelse(runif(1) < prob_move, proposal, current)
}
To see if this works, we can plot population size against frequency of visit by the king:
f <- table(positions) # compute frequencies
plot(as.vector(f), pop_size,
type = "n", # plot frequencies against relative population sizes
xlab = "frequency", ylab = "population size"
) # empty plot
text(x = f, y = pop_size, labels = names(f)) # add names of islands / their positions
Practice H6
Question: Modify the Metropolis algorithm code from the chapter to write your own simple MCMC estimator for globe tossing data and model from Chapter 2.
Answer: We want to fit the following model:
$$w∼Binom(θ,n)$$ $$θ∼Unif(0,1)$$ Our Metropolis algorithm looks like this:
set.seed(42)
# the globe tossing data
w <- 6
n <- 9
# prior on p
p_prior <- function(p) dunif(p, min = 0, max = 1)
# initializing MCMC
iter <- 1e4
p_sample <- rep(0, iter)
p_current <- 0.5 # start value
for (i in 1:iter) {
p_sample[i] <- p_current # # record current p
p_proposal <- runif(1, min = 0, max = 1) # generate proposal
# compute likelihood for current and proposal
lkhd_current <- dbinom(w, n, p_current)
lkhd_proposal <- dbinom(w, n, p_proposal)
prob_proposal <- lkhd_proposal * p_prior(p_proposal)
prob_current <- lkhd_current * p_prior(p_current)
prob_accept <- prob_proposal / prob_current
p_current <- ifelse(runif(1) < prob_accept, p_proposal, p_current)
}
Let’s visualise what happened here:
plot(p_sample, type = "l", col = "royalblue4")
Finally, let’s plot the posterior distribution:
dens(p_sample, col = "royalblue4", adj = 1)
curve(dbeta(x, w + 1, n - w + 1), from = 0, to = 1, add = T, lty = 2)
abline(v = median(p_sample))
Session Info
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidybayes_2.3.1 rethinking_2.13 rstan_2.21.2 ggplot2_3.3.6 StanHeaders_2.21.0-7
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 mvtnorm_1.1-1 lattice_0.20-41 tidyr_1.1.3 prettyunits_1.1.1 ps_1.6.0 assertthat_0.2.1 digest_0.6.27 utf8_1.2.1
## [10] V8_3.4.1 plyr_1.8.6 R6_2.5.0 backports_1.2.1 stats4_4.0.5 evaluate_0.14 coda_0.19-4 highr_0.9 blogdown_1.3
## [19] pillar_1.6.0 rlang_0.4.11 curl_4.3.2 callr_3.7.0 jquerylib_0.1.4 R.utils_2.10.1 R.oo_1.24.0 rmarkdown_2.7 styler_1.4.1
## [28] labeling_0.4.2 stringr_1.4.0 loo_2.4.1 munsell_0.5.0 compiler_4.0.5 xfun_0.22 pkgconfig_2.0.3 pkgbuild_1.2.0 shape_1.4.5
## [37] htmltools_0.5.1.1 tidyselect_1.1.0 tibble_3.1.1 gridExtra_2.3 bookdown_0.22 arrayhelpers_1.1-0 codetools_0.2-18 matrixStats_0.61.0 fansi_0.4.2
## [46] crayon_1.4.1 dplyr_1.0.5 withr_2.4.2 MASS_7.3-53.1 R.methodsS3_1.8.1 distributional_0.2.2 ggdist_2.4.0 grid_4.0.5 jsonlite_1.7.2
## [55] gtable_0.3.0 lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1 scales_1.1.1 KernSmooth_2.23-18 RcppParallel_5.1.2 cli_3.0.0 stringi_1.5.3
## [64] farver_2.1.0 bslib_0.2.4 ellipsis_0.3.2 generics_0.1.0 vctrs_0.3.7 rematch2_2.1.2 forcats_0.5.1 tools_4.0.5 svUnit_1.0.6
## [73] R.cache_0.14.0 glue_1.4.2 purrr_0.3.4 processx_3.5.1 yaml_2.2.1 inline_0.3.17 colorspace_2.0-0 knitr_1.33 sass_0.3.1