Associate Professor in Mathematics & Statistics

University of Warwick & The Alan Turing Institute

Biography

I am an associate professor in statistics and mathematics in the Mathematics Insitute and the department of statistics at the University of Warwick. I am also the director of data study groups at the Alan Turing Institute. As well as a theme lead for the health and medical sciences research programme.
My research interests lie at the interface of applied probability, computational statistics and machine learning, with a particular focus on industrial applications. I’ve worked in application areas ranging from transport, energy, and in particular health and well-being.

Before joining the University of Warwick, I was a lecturer in Statistics at the University of Oxford. Prior to this I undertook postdoctoral research with Arnaud Doucet (Oxford) and Yee Whye Teh (Oxford). I completed my PhD in September 2013 in Mathematics at the University of Warwick supervised by Andrew Stuart and Martin Hairer.

Interests

  • Scalable Methods for Statistical Inference and Machine Learning.
  • Applications of Stats/ML to health and well being.

Education

  • PhD in Mathematics

    University of Warwick

  • MSc in Mathematics

    University of Warwick

Projects

Machine Learning in Julia

MLJ (Machine Learning in Julia) is a toolbox written in Julia providing a common interface and meta-algorithms for selecting, tuning, evaluating, composing and comparing machine learning models written in Julia and other languages.

Modelling COVID-19 exit strategies for policy makers in the United Kingdom

The potential of using less accurate tests as means of pandemic surveillance is missed, rather than providing reliable medical information about an individual. We investigate whether the additional information from non-diagnostic-quality mass testing can lead to the implementation of a quicker and lower risk exit strategy from lockdown.

Stein’s method for Statistics and Machine Learning

Developing efficient Stein-based discrepancies for inference and assessment.

Accelerating Markov Chain Monte Carlo

Speeding up MCMC for scalable Bayesian Inference.

Recent Publications

Quickly discover relevant content by filtering publications.

Digital Health Management During and Beyond the COVID-19 Pandemic: Opportunities, Barriers, and Recommendations

During the coronavirus disease (COVID-19) crisis, digital technologies have become a major route for accessing remote care. Therefore, …

coexist

COEXI(S)T - Modelling COVID-19 exit strategies for policy makers in the United Kingdom

A geotemporal survey of hospital bed saturation across England during the first wave of the COVID-19 Pandemic

Research Group

Postdocs, PhDs and MScs

Postdocs

2018-2020 : Gergo Bohner

2020 - : Gaurav Venkataraman

2015-2017 : Tigran Nagapetyan

PhD Students

2019 : Harrison Wilde

2018 : Arne Gouwy

Teaching

Please login to Warwick’s Moodle Platform

For students projects please email me, for an idea of potential projects consider the following list:

Stochastic Processes

PhD

- Invariant distributions of interacting stochastic differential equaations

BSc/MSc

- Approximation of stochastic differential equation - Piecewise deterministic Markov Process and their applications - Probability measure valued SEIR model

Methodolgy

PhD

- Posterior approximation based on gradient flows - Quasi-newton sampling techniques - Improved syn

BSc/MSc

- Hypothesis testing for Model comparison - Covariate Shift for Bayesian methodology - Interpretable machine learning

Application to Medicine

PhD

- Treatment selection using observational and trial data

MSc/PhD

- Risk prediction for rehabilitation from severe illnesses

BSc/MSc

- Modelling of capacity in healthcare setting

Machine Learning in Julia

MLJ (Machine Learning in Julia) is a toolbox written in Julia that provides a common interface and meta-algorithms for selecting, tuning, evaluating, composing and comparing machine model implementations written in Julia and other languages. More broadly, the MLJ project hopes to bring cohesion and focus to a number of emerging and existing, but previously disconnected, machine learning algorithms and tools of high quality, written in Julia. A welcome corollary of this activity will be increased cohesion and synergy within the talent-rich communities developing these tools. In addition to other novelties outlined below, MLJ aims to provide first-in-class model composition capabilities. Guiding goals of the MLJ project have been usability, interoperability, extensibility, code transparency, and reproducibility. Paper

Modelling COVID-19 exit strategies for policy makers in the United Kingdom

The potential of using less accurate tests as means of pandemic surveillance is missed, rather than providing reliable medical information about an individual. We investigate whether the additional information from non-diagnostic-quality mass testing can lead to the implementation of a quicker and lower risk exit strategy from lockdown.

Stein’s method for Statistics and Machine Learning

Stein’s method is a probabilistic approach to controlling the distance between probability measures in terms of a differential and difference operators. While this method was first developed with the aim of obtaining quantitative central limit theorems, it has recently been adopted by the computational statistics and machine learning community for various practical tasks including goodness-of-fit testing, generative modeling, global non-convex optimisation, variational inference, de novo sampling, constructing powerful control variates for Monte Carlo variance reduction, and measuring the quality of Markov chain Monte Carlo algorithms.

Accelerating Markov Chain Monte Carlo

In recent years there has been an explosion of complex data-sets in areas as diverse as Bioinformatics, Epidemiology, Finance as well as from Industrial Monitoring Processes. Realistically, the complex dynamics characterising the data generating process can only be expressed as a high-dimensional stochastic model, rendering exact Bayesian inference and classical Monte Carlo methods insufficient. The only computationally feasible and accurate way to perform statistical inference is with Markov Chain Monte Carlo (MCMC), typically employing methods based on Metropolis-Hastings schemes.

A key feature of these methods is that they are asymptotically un-biased, i.e. the posterior distribution can be approximated arbitrarily well, provided sufficient samples are generated. While this property is advantageous, this comes at a cost: typically such methods exhibit large variance meaning that many samples must be generated to obtain an approximation to within a given error tolerance. This has motivated practitioners to investigate approximate inference methods, where the property of asymptotic unbiasedness is relaxed, thus permitting a small amount of bias at the cost of a huge reduction in variance.

Another class of approximate inference algorithms is the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, originally proposed for approximating expectations with respect to an unnormalised density. In contrast to the discretised Langevin dynamics, the SGLD uses an unbiased estimate of the gradient of the log target based on a small subset of the data, thus mitigating computational cost by requiring a reduced number of data point evaluations per sample generated.

There has been much recent activity on the use of piecewise deterministic markov processes (PDMP’s) as a method for sampling. While still in their infancy, these novel methods appear to be advantageous over state of the art MCMC methodology. PDMP based methods for sampling demonstrate many highly desirable properties: they are non-reversible but remain asymptotically un-biased, and moreover, for tall-data problems sub-sampling can be introduced while preserving the invariant measure.