This article purposed to present advanced methods in structural MCMC modeling using mcmcabn

  • improving convergence rate by heating-up the MCMC chain
  • visualizing the DAGs diversity landscape

Heating-up MCMC chain

Let us examine a first MCMC search (1000 MCMC steps).


# loading libraries
data(asia, package='bnlearn')
library(mcmcabn)
library(abn)
library(ggplot2)

# MCMC search
out <- mcmcabn(score.cache = bsc.compute.asia,
                         score = "mlik",
                         data.dists = dist.asia,
                         max.parents = 2,
                         mcmc.scheme = c(1000,0,0),
                         seed = 42,
                         verbose = FALSE,
                         start.dag = "random",
                         prob.rev = 0.03,
                         prob.mbr = 0.03, 
                         prior.choice = 1)

Here the optimal learned score and the maximum possible computed with an exact method.

#maximum score get from the MCMC sampler
max(out$scores)
#> [1] -11156

#maximum scoring network using exact search (not MCMC based) 
dag <- mostProbable(score.cache = bsc.compute.asia)
#> Step1. completed max alpha_i(S) for all i and S
#> Total sets g(S) to be evaluated over: 256
fitAbn(object = dag,data.df = asia, data.dists = dist.asia)$mlik
#> [1] -11151

Plot of the MCMC run:

plot(out)

Below is the summary of the MCMC run:

summary(out)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1000
#> Thinning: 0
#> 
#> Maximum score: -11156
#> Empirical mean: -11201
#> Empirical standard deviation: 340.81
#> Quantiles of the posterior network score:
#>           0.025   0.25    0.5   0.75  0.975
#> BN score -11285 -11157 -11156 -11156 -11156
#> 
#> 
#> Global acceptance rate: 0.060939
#>     Accepted Rejected
#> MBR        5       26
#> MC3       50      896
#> REV        6       18
#> 
#> 
#> Sample size adjusted for autocorrelation: 59.71
#> 
#> Autocorrelations by lag:
#>     0       1       2       3       4       5       6       7       8       9
#> acf 1 0.91376 0.82764 0.72634 0.62502 0.54213 0.45876 0.36669 0.27379 0.21848
#>          10
#> acf 0.16229

To speed-up convergence, one could use the argument heating from mcmcabn(). It should be a real positive number, where one is neutral. There are two possible uses of this tuning parameter:

  1. Flate heating-up scheme If smaller than one, it is a tuning parameter which transforms the score by raising it to this power. It increases the acceptance rate of score-worsening moves and changes nothing to the acceptance rate of score-increasing moves. The smaller, the more probable to accept any move.
  2. Heating-up scheme If larger than one, it indicates the number of returned steps where an exponentially increase heating scheme is applied. After the given number of steps, the heating parameter is set-up to one.

Flate heating up

Same example as before with heating parameter set up to 0.5

# MCMC search with heating = 0.5 for all moves
out1 <- mcmcabn(score.cache = bsc.compute.asia,
                         score = "mlik",
                         data.dists = dist.asia,
                         max.parents = 2,
                         mcmc.scheme = c(1000,0,0),
                         seed = 42,
                         verbose = FALSE,
                         start.dag = "random",
                         prob.rev = 0.03,
                         prob.mbr = 0.03, 
                         prior.choice = 1,heating = 0.5)

summary(out1)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1000
#> Thinning: 0
#> 
#> Maximum score: -11151
#> Empirical mean: -11225
#> Empirical standard deviation: 337.98
#> Quantiles of the posterior network score:
#>           0.025   0.25    0.5   0.75  0.975
#> BN score -11426 -11192 -11189 -11186 -11151
#> 
#> 
#> Global acceptance rate: 0.11289
#>     Accepted Rejected
#> MBR        4       28
#> MC3      108      837
#> REV        1       23
#> 
#> 
#> Sample size adjusted for autocorrelation: 60.687
#> 
#> Autocorrelations by lag:
#>     0       1       2       3       4       5       6       7       8       9
#> acf 1 0.91313 0.82631 0.72408 0.62184 0.53815 0.45426 0.36105 0.26726 0.21031
#>          10
#> acf 0.15272

As one can see, the acceptance rate approximately doubles. As the landscape of visited DAGs growth, the chance to find a better scoring structure improve. But this introduces a bias in the computed posterior distribution.

Heating-up scheme

Another implemented heating-up scheme is by deciding the number of steps during which an exponentially increasing heating-up scheme is applied. However, being more computationally demanding, this is the recommended approach to improve convergence.

# MCMC search with a heating scheme of 200 steps
out2 <- mcmcabn(score.cache = bsc.compute.asia,
                         score = "mlik",
                         data.dists = dist.asia,
                         max.parents = 2,
                         mcmc.scheme = c(1500,0,0),
                         seed = 42,
                         verbose = FALSE,
                         start.dag = "random",
                         prob.rev = 0.03,
                         prob.mbr = 0.03, 
                         prior.choice = 1,heating = 500)

summary(out2)
#> MCMC summary:
#> Number of burn-in steps: 0
#> Number of MCMC steps: 1500
#> Thinning: 0
#> 
#> Maximum score: -11151
#> Empirical mean: -11370
#> Empirical standard deviation: 438.4
#> Quantiles of the posterior network score:
#>           0.025   0.25    0.5   0.75  0.975
#> BN score -12083 -11422 -11264 -11151 -11151
#> 
#> 
#> Global acceptance rate: 0.18454
#>     Accepted Rejected
#> MBR       11       35
#> MC3      253     1161
#> REV       13       28
#> 
#> 
#> Sample size adjusted for autocorrelation: 25.65
#> 
#> Autocorrelations by lag:
#>     0       1      2       3      4       5       6       7       8       9
#> acf 1 0.96618 0.9267 0.88431 0.8416 0.79917 0.75591 0.71205 0.66791 0.62424
#>          10
#> acf 0.57987

The heating parameter in function of the number of steps is:


p.data <- data.frame(x = 1:1501, heating = out2$heating)
ggplot(data=p.data, aes(x=x, y=heating)) +
  geom_line(lwd=2)

Then, at least, the first 500 steps should be considered as the burn-in phase. As one can see below, the MCMC chain moves more easily within structures during this phase.

plot(out2)

Visualizing DAGs diversity landscape

The main incentive to perform structural MCMC is the belief that presenting a single best scoring DAG is rudimentary, taking into account the number of possible quasi-equally scoring DAGs at the top of the posterior distribution. The code below allows us to display this diversity.

#renaming columns of the dataset
colnames(asia) <- c("Asia", "Smoking", "Tuberculosis", "LungCancer", "Bronchitis", "Either", "XRay", "Dyspnea")

# MCMC search
out <- mcmcabn(score.cache = bsc.compute.asia,
                         score = "mlik",
                         data.dists = dist.asia,
                         max.parents = 2,
                         mcmc.scheme = c(2000,0,2000),
                         seed = 231,
                         verbose = FALSE,
                         start.dag = "random",
                         prob.rev = 0.03,
                         prob.mbr = 0.03, 
                         prior.choice = 1, heating = 200)

# unique list of structure
u.list.dag <- unique.array(x = out$dags,MARGIN = 3)

# counting occurence of structures
tab <- apply(X = u.list.dag, MARGIN = 3, FUN = function(x){
  sum(apply(X = out$dags,MARGIN = 3,FUN = function(y){
    if(identical(x,y)){1}else{0}
  }))
})

# scoring the DAGs
nb.dag <- 10
scores.dags <- vector(length = nb.dag)
num.arcs <- vector(length = nb.dag)
shd <- vector(length = nb.dag)

for(i in 1:nb.dag){

  dag <- u.list.dag[,,order(tab,decreasing = TRUE)[i]]
  colnames(dag) <- rownames(dag) <- names(dist.asia)
  fabn <- fitAbn(dag= dag,data.df = asia,data.dists = dist.asia,method = "bayes")
  scores.dags[i] <- fabn$mlik
  num.arcs[i] <- sum(dag)
    shd[i] <- compareDag(ref = u.list.dag[,,order(tab,decreasing = TRUE)[1]],u.list.dag[,,order(tab,decreasing = TRUE)[i]])$`Hamming-distance`
}

Let us plot the MCMC run

plot(out)

Let us plot the diversity plot, where the x1-axis is the number of arcs of each structure, the x2-axis is the structural Hamming distance (a proxy for how different two DAGs are), the y1-axis is the occurrence of the structures, and the y2-axis is the scores. As one can see, half of the posterior distribution is represented by one DAG.

plot(1:nb.dag, sort(tab,decreasing = TRUE)[1:nb.dag], type = 'n',ylab = "",xlab = "Number of arcs",xaxt="n",yaxt="n", ylim = c(0,2000))
axis(2,at = c(0, 1000, 2000),labels = c("0.0%","50%","100%"),col.axis = "#4393C3")
mtext("Occurence of DAGs", side=2, line=2, col="#4393C3")
rect(1:nb.dag - .4, 0, 1:nb.dag + .4, sort(tab,decreasing = TRUE)[1:nb.dag], col = '#4393C3')
par(new = TRUE)
plot(x = 1:nb.dag,y = scores.dags,col="red", type = 'b', lwd=2, axes = FALSE, xlab = "",ylab="")
axis(4, col.axis = 'red')
mtext("DAGs scores", side=4, line=1.5, col="red")
axis(1, col.axis = 'black',at = 1:nb.dag,labels = num.arcs)
axis(3, col.axis = 'orange',at = 1:nb.dag,labels = shd)
mtext("Structural Hamming distances", side=3, line=2, col="orange")