Home Diese Seite drucken | Grosse Schrift | Mobile Ansicht

Applied Statistics

Research

Global Change and Biodiversity

Assessing uncertainty in global change–biodiversity research using
multiscale Bayesian modelling
Florian Gerber, in collaboration with  Gabriela Schaepman-Strub (IEU, UZH)

We aim to improve the understanding of ecosystem-environment
interactions by linking local, sparse field data together with coarse
satellite data and data from Earch System Models (ESMs). Spatio-temporal
hierarchical Bayesian models will be used to describe the correlation
between biodiversity, biophysical feedbacks and climate. Due to the
large number of data points and model parameters, estimation is
challenging and brute-force Markov chain Monte Carlo (MCMC) simulations
are likely to fail. This motivates the developments of suitable
approximation techniques. The statistical frame-work will allow us to
quantify, e.g., biodiversity–albedo relations and predict their future
behavior. Also, uncertainty estimates associated with model parameters
and predictions will be derived.

Statistical Ice Shelf Modeling

Observed ice thickness for the Ross ice shelf, Antarctica. Grey area: grounded ice sheet. Black area: ocean.

Towards uncertainty quantification of ice shelves simulations as key components of the global cryosphere
David Masson, in collaboration with Nina Kirchner (Stockholm University)

This project aims at building a nucleus for “Statistical Glaciology” by cross-fertilizing the previously virtually unrelated fields of numerical ice modeling and statistics. While the atmospheric and oceanic modeling community has since long integrated statistical tools in the analysis of uncertainties, the ice sheet modeling community still struggles to provide a systematic quantification of uncertainties of their simulations. Indeed, intricate fluid-dynamics equations often require large amount of computations, which in turn severely limits the estimation of uncertainty by multiple simulation runs.

More specifically, we focus on ice shelves. These 'floating glaciers' are several hundreds of meters thick and play a major role in global climate through their interaction with the polar oceans. Here, the modeling of ice shelves is split into two components : first, a deterministic component provides a gross approximation of the ice-shelf dynamics via a simple 2D-advection of ice into the ocean. The second component consists in stochastic processes that correct the errors made above. The melting processes beneath the ice shelf as well as the shelf's disaggregation in smaller icebergs at the calving front is simulated by statistical models. For instance, ice melting is generated by random Gaussian processes, and ice calving at the shelf's front is modeled by Poisson- and extreme value processes. If this technique can reproduce steady-state ice-shelves in Antarctica, we propose then to simulate several thousands of ice-shelves to explore the parameter space. All simulations leading to steady-states agreeing with present-day observations contains valuable information about plausible parameter ranges. Hence, this study will enable: (i) the quantification of uncertainties in ice shelf simulations, (ii) the identification their driving factors, (iii) and will inform numerical modelers about crucial, yet typically unobservable parameters (e.g. melting rate at the underside of an ice shelf).

Computer experiments

Evaluation Strategies for Computer Experiments with Medium-sized Datasets

Clément Chevalier, in collaboration with David Ginsbourger (University of Berne)

The design and analysis of computer experiments is a growing topic with important applications in engineering in different domains like nuclear safety, meteorology, finance, telecommunications, oil exploration, vehicle crash-worthiness, etc. In the literature, Gaussian processes  are often used as a surrogate model to construct response surfaces of the output of some complex black-box simulator. These models also have the ability to quantify uncertainties and guide parsimonious evaluation strategies of the simulator.


Typical applications, however, are limited to small (typically, less than 1000) number n of observations due to a typical n x n matrix storage and inversion involved in the computations. This research project aims at facilitating the use of computer experiment techniques to dataset of intermediate size (i.e. less than, 50,000 - 100,000), through the use of some well-chosen covariance functions. In particular, we would like to deal with two major issues:

  1. (Theoretical) Quantify the discrepancy between our covariance models and other typical models.
  2. (Practical) Adapt some typical sequential evaluation strategies in the computer experiment literature to settings with larger datasets and/or design new strategies.

Climate data analysis

Multivariate Modeling of Large Non-stationary Spatial and Spatio-temporal Climate Fields

Mattia Molinaro, in collaboration with R. Knutti (ETH)

Climate data analysis represents a great challenge not only from a statistical point of view, but also for its implications to other disciplines, such as economics and public health. There are a number of AOGCMs (atmosphere-ocean general circulation models) that are currently being used by scientists to model and possibly predict the extremely complex interactions governing the global climate: it appears therefore crucial to develop a statistical model able to meaningfully combine the obtained results. From this point of view, the project will focus on two main aspects: quantifying the uncertainty resulting from the considered AOGCMs and attributing it to different factors. Accomplishing this task will rely on advanced and tailored techniques, such as finding a proper parametrization of specific GMRFs (Gaussian Markov Random Fields) that model the climate fields related to the aforementioned AOGCMs. In order to take advantage of the available previous knowledge, we choose to develop a hierarchical Bayesian model whose different levels are expected to statistically describe the observations and the prior distributions of the parameters of interest. Via the Bayes theorem, the related posterior distribution can be found. As it happens oftentimes in Bayesian modeling, a closed formula does not exits. As a consequence, it will be necessary to pay particular attention to the computational aspects of the problem: we expect to use both MCMC techniques and Laplace nested approximations via the package INLA. After having developed the above outlined framework, we plan on establishing a multivariate ANOVA model for a specific AOGCM. The aim is to take into consideration the uncertainties inherent future projections and climate change, in order to provide a possible answer to the interdisciplinary questions stated in the introduction.

Bayesian Network

Developing Bayesian Networks as a tool for Epidemiological Systems Analysis

Marta Pittavino, in collaboration with F.I. Lewis and P. Torgerson (VetSuisse).

Bayesian network (BN) analysis is a form of probabilistic modeling which derives empirically from observed data a directed acyclic graph (DAG) describing the dependency structure between random variables.

BN are increasingly finding application in areas such as computational and systems biology, and more recently in epidemiological analysis.

These approaches are relatively new within the discipline of veterinary epidemiology, but promise a quantum leap in our ability to untangle the complexity of disease occurrence because of their capability to formulate and test multivariate model structures. The multivariate approach considers all dependencies between all variables observed in an epidemiological study, without the need to declare one outcome variable as a function of one of more “predictor” variables (the multivariable approach). 

Three key challenges exist in dealing with additive Bayesian network and that are going to be tackled within this research project, one theoretical, one epidemiological and another one computational.

Firstly, trying to develop a likelihood equivalent parameter priors for non-conjugate Bns. This will produce a score equivalent metric used to evaluate and discriminate between bayesian network structures.

Secondly, to demonstrate and promote the use of this methodology in epidemiology, relevant and high quality exemplar case studies will be developed showing objectively, situations in which BN models can offer the most added value, relative to existing standard statistical and epidemiological methods.

Thirdly, no appropriate software exists for fitting the types of BN models necessary for analysing epidemiological data, where complexities such as grouped/overdispersed/correlated observations are ubiquitous. This project will develop easy-to-use software to allow ready access to BN modelling to epidemiological practitioners, which is essential in order to make the crucial transition from merely a technically attractive methodology, to an approach which is actually used in practice.

 more  

top