Latent Gaussian models have numerous applications, for example in
spatial and spatio-temporal epidemiology and climate modelling. This
workshop brings together researchers who develop and apply Bayesian
inference in this broad model class. One methodological focus is on
model computation, using either classical MCMC techniques or more
recent deterministic approaches such as integrated nested Laplace
approximations (INLA). A second theme of the workshop is model
uncertainty, ranging from model criticism to model selection and model
averaging.
In estimation problems with one entertained statistical model, Bayesian
answers are naturally expressed as probabilistic statements this being
a major appeal of Bayesian approach. In model selection problems however,
probabilistic arguments are the exception rather than the rule, even though
posterior model probabilities are the obvious measure of evidence. The main
reason is the difficulty of appropriate prior elicitation. Indeed, the
dimension of most problems make subjective assignments practically impossible,
but assignment of objective priors that are suitable for model selection is
rather difficult and not yet entirely understood. To further complicate
matters, the key ideas behind the choice of priors for objective Bayes estimation
are useless to assign priors for model selection. In this talk we study arguments
(like invariance, robustness and predictive matching) that we consider
specially relevant and useful to guide the construction of a model selection prior.
We do so within the context of variable selection in normal regression models
and using flexible heavy tailed model specific priors. Our recommended proposal
is a prior distribution which, quite remarkably, produces Bayes factors in
closed form. This quite convenient property can substantially reduce the
computational burden in problems with large number of explanatory variables.
So far, only the popular conjugate Zellner's g-prior had this convenient property.
Another important aspect in model selection is how to deal with huge model spaces,
for which exhaustive enumeration of all models is unfeasible and inferences have
to be based on the very small proportion of visited models. We review some
of the strategies proposed in the literature and argue that inferences based
on empirical frequencies via MCMC sampling of the posterior distribution
outperforms recently proposed searching methods. We provide our likely explanation
for this effect and a number of illustrative examples.
(This talk is based on joint work with Susie Bayarri, Jim Berger, Anabel Forte and Miguel Martínez-Beneito)
The talk discusses Bayesian variable selection and model identification
for latent variable models. It is shown how variable selection and model
identification is achieved by combining the likelihood function of a
general latent variable model with a prior which induces sparsity.
While this approach is by now well-known for variable selection in a
standard regression model, only few researchers tried so far to extend
this approach to latent variable models. Such an extension is illustrated
for two special classes of latent variable models.
The first example is the random intercept model which is widely applied
in econometric analysis of panel data. Choosing a sparsity prior for this
model is closely related to the appropriate choice of the distribution of
heterogeneity. If, for instance, a Laplace rather than the usual normal
prior is considered as prior distribution of the random effects, we
obtain the Bayesian Lasso random effects model which allows individual
shrinkage of the random effects toward 0. The sparsity prior allows,
in this way, to identify units in the panel with zero random effects.
In addition, spike-and-slab random effects models with both an absolutely
continuous and a Dirac spike are studied.
The second example is the basic structural time series model which has a
representation as a state space model. By choosing suitable sparsity
priors on the variances appearing in this model it is possible to separate
components which are fixed from components which are random.
Finally, details of efficient MCMC estimation are discussed for all models.
(This talk is based on joint work with Helga Wagner)
Explaining species distribution using local environmental features is a long standing ecological
problem. Often, available data is collected as a set of presence locations only thus
precluding the possibility of a presence-absence analysis. We propose that it is natural to
view presence-only data for a region as a point pattern over that region and to use local
environmental features to explain the intensity driving this point pattern. This suggests
hierarchical modeling, treating the presence data as a realization of a spatial point process
whose intensity is governed by environmental covariates. Spatial dependence in the intensity
surface is modeled with random effects involving a zero mean Gaussian process. Highly
variable and typically sparse sampling effort as well as land transformation degrades the
point pattern so we augment the model to capture these effects. The Cape Floristic Region
(CFR) in South Africa provides a rich class with such species data. The potential, i.e.,
nondegraded presence surfaces over the entire area are of interest from a conservation and
policy perspective.
Our model assumes grid cell homogeneity of the intensity process where the region is
divided into ~37, 000 grid cells. To work with a Gaussian process over a very large number
of cells we use predictive process approximation. Bias correction by adding a heteroscedastic
error component is implemented. The model was run for a number of different species.
Model selection was investigated with regard to choice of environmental covariates. Also,
comparison is made with the now popular Maxent approach, though the latter is much more
limited with regard to inference. In fact, inference such as investigation of species richness
immediately follows from our modeling framework.
We discuss computational strategies to assist in Bayesian logistic regression analysis of case-control population based genome-wide association studies (GWAS) aimed at highlighting human genetic (or genomic) variation that associates with common disease risk. We explore Monte Carlo and asymptotic (Laplace) approximations and how they can be used to alleviate some of the computational challenges arising in high-dimensional logistic regression in the presence of predictor set uncertainty on highly-structured genetic covariates.
Markov random field models generated by high-rank Hilbert space
approximations of stochastic partial differential equations are
surprisingly practical, and allow easy construction of non-stationary
non-separable space-time models. This avoids the need to design
kernels or positive definite covariance functions, while also giving
easy access to complex dependencies. Furthermore, the method is
faster and more accurate than approximations based on covariance
tapering or compactly supported kernels. The spatially continuous
interpretation also allows spatially consistent Markov random field
models to be constructed on irregular grids, regardless of the type of
observation process, further reducing the computational costs
typically associated with dense lattices. The approach is illustrated
with global temperature data.
Conditional auto-regressive models are popular for areal data, with
the Markov random field (MRF) precision matrix often based simply on
whether areas share a boundary. An alternative, when the areas form a
regular grid, is the Markov random field approximation to a thin plate
spline (Rue and Held, (2005)).
I consider the use of these and other Markov random field
specifications to represent latent spatial
processes on a fine grid. One can then consider likelihoods that
relate the latent process to point observations or areal observations,
in the process avoiding the modifiable areal unit problem. I
explore the properties of different MRF specifications in this context
based on analytic calculations and simulations. Computational
approaches include penalized quasi likelihood and INLA. MCMC in the
Gaussian likelihood case can be feasible, but it poses difficulties
for generalized models.
Approximate Bayesian computation (ABC), also known as likelihood-free
methods, have become a standard tool for the analysis of complex models,
primarily in population genetics but also for complex financial models. We
examined in Grelaud et al. (Bayesian Analysis, 2009) the use of ABC for
Bayesian model choice in the specific of Gaussian random fields (GRF),
relying on a sufficient property only enjoyed by GRFs to show that the
approach was legitimate. Despite having previously suggested the use of
ABC for model choice in a wider range of models in the DIY ABC software
(Cornuet et al., Bioinformatics, 24(23), 2713-19, 2008), we present
theoretical evidence that the general use of ABC for model choice is
fraught with danger in the sense that no amount of computation, however
larger, can garantee a proper approximation of the posterior probabilities
of the models under comparison. This work shows as an corollary that GRFs
are the exception to this lack of convergence.
(This talk is based on joint work with Jean-Michel Marin and Natesh Pillai).
In this talk, I will discuss the INLA methodology and software,
making some ``historical'' remarks, discuss the current status with its
successes and limitations, and then present some open problems
and a wishlist for the future.
The North American Regional Climate Change Assessment Program (NARCCAP) is an
ambitious multi-agency, multi-institution collaboration to produce regional
projections of climate change for North America based on a multi-model ensemble
of regional climate models. In this talk, I will present a statistical approach
to analyze and combine the information in the ensemble based on a functional
analysis of variance (ANOVA) embedded within a hierarchical Bayesian method that
accounts for differences in the models as well as the spatial correlation in the
model output. In particular, I will present preliminary results that seek to
examine the various sources of uncertainty in the model output.
On Wednesday morning, Håvard Rue will present a tutorial about INLA.
Abstract submission for contributed talks and poster presentations is now closed.