Data science pipelines

Image: DeepMind

A typical data science workflow involves a complex sequence of processes to transform raw data into actionable insights. These involve various tasks including data cleaning, exploratory data analysis, missing data imputation, feature extraction, model selection and training. Our research focusses on the challenges and opportunities presented by automated and AI-assisted machine learning pipelines.

Possible topics

  • Adversarial data analysis
  • AI-assisted Bayesian data analysis workflows
  • An automated visualization grammar for exploratory data analysis
  • Automated survey analysis pipelines
  • Data preparation multiverse analysis
  • Discovery and synthesis of data science pipelines via agent self-experimentation
  • Forensic data archaeology for scientific replicability
  • Preprocessing strategies for spatio-temporal event data

Further reading

  • Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, and Volker Markl. 2022. DORIAN in action: assisted design of data science pipelines. Proceedings of the VLDB Endowment 15, 12 (August 2022), 3714–3717.
Sergey Redyuk
Sergey Redyuk
Postdoctoral Researcher

Data scientist, researcher. Interested in bridging gaps between data science and applications.

David Antony Selby
David Antony Selby
Senior Researcher

My research interests include latent variable modelling, reproducibility, citation networks and applications of statistics and machine learning to healthcare.