Data science pipelines
A typical data science workflow involves a complex sequence of processes to transform raw data into actionable insights. These involve various tasks including data cleaning, exploratory data analysis, missing data imputation, feature extraction, model selection and training. Our research focusses on the challenges and opportunities presented by automated and AI-assisted machine learning pipelines.
Possible topics
- Adversarial data analysis
- AI-assisted Bayesian data analysis workflows
- An automated visualization grammar for exploratory data analysis
- Automated survey analysis pipelines
- Data preparation multiverse analysis
- Discovery and synthesis of data science pipelines via agent self-experimentation
- Forensic data archaeology for scientific replicability
- Preprocessing strategies for spatio-temporal event data
Further reading
- Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, and Volker Markl. 2022. DORIAN in action: assisted design of data science pipelines. Proceedings of the VLDB Endowment 15, 12 (August 2022), 3714–3717. https://doi.org/10.14778/3554821.3554882