mcmcabn-advanced.Rmd
This article purposed to present advanced methods in structural MCMC modeling using mcmcabn
Let us examine a first MCMC search (1000 MCMC steps).
# loading libraries
data(asia, package='bnlearn')
library(mcmcabn)
library(abn)
library(ggplot2)
# MCMC search
out <- mcmcabn(score.cache = bsc.compute.asia,
score = "mlik",
data.dists = dist.asia,
max.parents = 2,
mcmc.scheme = c(1000,0,0),
seed = 42,
verbose = FALSE,
start.dag = "random",
prob.rev = 0.03,
prob.mbr = 0.03,
prior.choice = 1)
Here the optimal learned score and the maximum possible computed with an exact method.
#maximum score get from the MCMC sampler
max(out$scores)
#> [1] -11156
#maximum scoring network using exact search (not MCMC based)
dag <- mostprobable(score.cache = bsc.compute.asia)
#> Step1. completed max alpha_i(S) for all i and S
#> Total sets g(S) to be evaluated over: 256
fitabn(object = dag,data.df = asia, data.dists = dist.asia)$mlik
#> [1] -11151
Plot of the MCMC run:
plot(out)
Below is the summary of the MCMC run:
summary(out)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1000
#> Thinning: 0
#>
#> Maximum score: -11156
#> Empirical mean: -11201
#> Empirical standard deviation: 340.81
#> Quantiles of the posterior network score:
#> 0.025 0.25 0.5 0.75 0.975
#> BN score -11285 -11157 -11156 -11156 -11156
#>
#>
#> Global acceptance rate: 0.060939
#> Accepted Rejected
#> MBR 5 26
#> MC3 50 896
#> REV 6 18
#>
#>
#> Sample size adjusted for autocorrelation: 59.71
#>
#> Autocorrelations by lag:
#> 0 1 2 3 4 5 6 7 8
#> acf 1 0.91376 0.82764 0.72634 0.62502 0.54213 0.45876 0.36669 0.27379
#> 9 10
#> acf 0.21848 0.16229
To speed-up convergence, one could use the argument heating
from mcmcabn(). It should be a real positive number, where one is neutral. There are two possible uses of this tuning parameter:
Same example as before with heating parameter set up to 0.5
# MCMC search with heating = 0.5 for all moves
out1 <- mcmcabn(score.cache = bsc.compute.asia,
score = "mlik",
data.dists = dist.asia,
max.parents = 2,
mcmc.scheme = c(1000,0,0),
seed = 42,
verbose = FALSE,
start.dag = "random",
prob.rev = 0.03,
prob.mbr = 0.03,
prior.choice = 1,heating = 0.5)
summary(out1)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1000
#> Thinning: 0
#>
#> Maximum score: -11151
#> Empirical mean: -11225
#> Empirical standard deviation: 337.98
#> Quantiles of the posterior network score:
#> 0.025 0.25 0.5 0.75 0.975
#> BN score -11426 -11192 -11189 -11186 -11151
#>
#>
#> Global acceptance rate: 0.11289
#> Accepted Rejected
#> MBR 4 28
#> MC3 108 837
#> REV 1 23
#>
#>
#> Sample size adjusted for autocorrelation: 60.687
#>
#> Autocorrelations by lag:
#> 0 1 2 3 4 5 6 7 8
#> acf 1 0.91313 0.82631 0.72408 0.62184 0.53815 0.45426 0.36105 0.26726
#> 9 10
#> acf 0.21031 0.15272
As one can see, the acceptance rate approximately doubles. As the landscape of visited DAGs growth, the chance to find a better scoring structure improve. But this introduces a bias in the computed posterior distribution.
Another implemented heating-up scheme is by deciding the number of steps during which an exponentially increasing heating-up scheme is applied. However, being more computationally demanding, this is the recommended approach to improve convergence.
# MCMC search with a heating scheme of 200 steps
out2 <- mcmcabn(score.cache = bsc.compute.asia,
score = "mlik",
data.dists = dist.asia,
max.parents = 2,
mcmc.scheme = c(1500,0,0),
seed = 42,
verbose = FALSE,
start.dag = "random",
prob.rev = 0.03,
prob.mbr = 0.03,
prior.choice = 1,heating = 500)
summary(out2)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1500
#> Thinning: 0
#>
#> Maximum score: -11151
#> Empirical mean: -11370
#> Empirical standard deviation: 438.4
#> Quantiles of the posterior network score:
#> 0.025 0.25 0.5 0.75 0.975
#> BN score -12083 -11422 -11264 -11151 -11151
#>
#>
#> Global acceptance rate: 0.18454
#> Accepted Rejected
#> MBR 11 35
#> MC3 253 1161
#> REV 13 28
#>
#>
#> Sample size adjusted for autocorrelation: 25.65
#>
#> Autocorrelations by lag:
#> 0 1 2 3 4 5 6 7 8
#> acf 1 0.96618 0.9267 0.88431 0.8416 0.79917 0.75591 0.71205 0.66791
#> 9 10
#> acf 0.62424 0.57987
The heating parameter in function of the number of steps is:
p.data <- data.frame(x = 1:1501, heating = out2$heating)
ggplot(data=p.data, aes(x=x, y=heating)) +
geom_line(lwd=2)
Then, at least, the first 500 steps should be considered as the burn-in phase. As one can see below, the MCMC chain moves more easily within structures during this phase.
plot(out2)
The main incentive to perform structural MCMC is the belief that presenting a single best scoring DAG is rudimentary, taking into account the number of possible quasi-equally scoring DAGs at the top of the posterior distribution. The code below allows us to display this diversity.
#renaming columns of the dataset
colnames(asia) <- c("Asia", "Smoking", "Tuberculosis", "LungCancer", "Bronchitis", "Either", "XRay", "Dyspnea")
# MCMC search
out <- mcmcabn(score.cache = bsc.compute.asia,
score = "mlik",
data.dists = dist.asia,
max.parents = 2,
mcmc.scheme = c(2000,0,2000),
seed = 231,
verbose = FALSE,
start.dag = "random",
prob.rev = 0.03,
prob.mbr = 0.03,
prior.choice = 1, heating = 200)
# unique list of structure
u.list.dag <- unique.array(x = out$dags,MARGIN = 3)
# counting occurence of structures
tab <- apply(X = u.list.dag, MARGIN = 3, FUN = function(x){
sum(apply(X = out$dags,MARGIN = 3,FUN = function(y){
if(identical(x,y)){1}else{0}
}))
})
# scoring the DAGs
nb.dag <- 10
scores.dags <- vector(length = nb.dag)
num.arcs <- vector(length = nb.dag)
shd <- vector(length = nb.dag)
for(i in 1:nb.dag){
dag <- u.list.dag[,,order(tab,decreasing = TRUE)[i]]
colnames(dag) <- rownames(dag) <- names(dist.asia)
fabn <- fitabn(dag.m = (dag),data.df = asia,data.dists = dist.asia,method = "bayes")
scores.dags[i] <- fabn$mlik
num.arcs[i] <- sum(dag)
shd[i] <- compareDag(ref = u.list.dag[,,order(tab,decreasing = TRUE)[1]],u.list.dag[,,order(tab,decreasing = TRUE)[i]])$`Hamming-distance`
}
Let us plot the MCMC run
plot(out)
Let us plot the diversity plot, where the x1-axis is the number of arcs of each structure, the x2-axis is the structural Hamming distance (a proxy for how different two DAGs are), the y1-axis is the occurrence of the structures, and the y2-axis is the scores. As one can see, half of the posterior distribution is represented by one DAG.
plot(1:nb.dag, sort(tab,decreasing = TRUE)[1:nb.dag], type = 'n',ylab = "",xlab = "Number of arcs",xaxt="n",yaxt="n", ylim = c(0,2000))
axis(2,at = c(0, 1000, 2000),labels = c("0.0%","50%","100%"),col.axis = "#4393C3")
mtext("Occurence of DAGs", side=2, line=2, col="#4393C3")
rect(1:nb.dag - .4, 0, 1:nb.dag + .4, sort(tab,decreasing = TRUE)[1:nb.dag], col = '#4393C3')
par(new = TRUE)
plot(x = 1:nb.dag,y = scores.dags,col="red", type = 'b', lwd=2, axes = FALSE, xlab = "",ylab="")
axis(4, col.axis = 'red')
mtext("DAGs scores", side=4, line=1.5, col="red")
axis(1, col.axis = 'black',at = 1:nb.dag,labels = num.arcs)
axis(3, col.axis = 'orange',at = 1:nb.dag,labels = shd)
mtext("Structural Hamming distances", side=3, line=2, col="orange")