Data-centric programming and provenance
Provenance means metadata/trace information about a computation. Many scientific disciplines are increasingly data-driven and require support for provenance to make it possible to safely share and reuse raw data or processed results. Our recent focus has been on defining models of provenance that provide useful information for debugging, auditing or assurance or scientific results based on complex computations, and on implementing specific techniques addressing unmet needs for scientific data curation. Our emphasis has been on foundations and applications of ideas from programming languages and semantics to understanding provenance. PhD projects in this area could focus on either the foundations or implementation and evaluation of provenance techniques or on a combination of both. Research topics could include:
* Types and language design for integrating multiple data-centric programming models (e.g. language-integrated query)
* Extending bidirectional programming for synchronizing data across data models
* Language-based techniques for data curation and preservation, provenance tracking, or archiving
* Query and update techniques for longitudinal or provenance-aware queries.
* Extending and evaluating a "database wiki" system for making it easier to collaborate on scientific data resources while fully tracking the provenance and change history of the data.
Other projects, building on interests in the Database Group in data annotation, provenance, archiving, citation, and curation, or interests in the Data-Intensive Research Group in provenance for scientific workflows, are also possible.
Depending on the emphasis of the project, this project would be suitable for a student with a solid grounding in at least one of database systems, database theory or programming languages, and (for more systems-oriented work) some experience and interest in implementation and empirical evaluation.