Provenance
Provenance is metadata describing the origin, derivation or history of some data. Pervasive support for provenance is being proposed in several contexts, ranging from databases and scientific workflows to system configurations to intelligence gathering and analysis. Provenance has been a major research interest in Edinburgh over the past 10 years. This page collects summaries and links to our work.
Provenance in (Web) Databases
People
- Peter Buneman
- James Cheney
- Stefan Fehrenbach
- Adria Gascon
- Egor Kostylev (now at Oxford)
- Jan Stolarek
- Stratis Viglas
Publications
- Language-integrated provenance, Stefan Fehrenbach and James Cheney, Science of Computer Programming, in press.
- Language-integrated provenance, Stefan Fehrenbach and James Cheney, PPDP 2016
- Composition and Substitution in Provenance and Workflows, Peter Buneman, Adrià Gascón, and Dave Murray-Rust, TaPP 2016
- Provenance Segmentation, Rui Abreu, Dave Archer, Erin Chapman, James Cheney, Hoda Eldardiry, and Adrià Gascón
- The rationale of PROV, Luc Moreau, Paul T. Groth, James Cheney, Timothy Lebo, and Simon Miles. J. Web Sem. 35: 235-257 (2015)
- Language-Integrated Provenance in Links, Stefan Fehrenbach and James Cheney, TaPP 2015
- Classification of annotation semirings over containment of conjunctive queries. Egor V. Kostylev, Juan L. Reutter, András Z. Salamon. ACM Trans. Database Syst. 39(1): 1 (2014)
- Dynamic provenance for SPARQL Updates, Harry Halpin and James Cheney, ISWC 2014
- Database Queries that Explain their Work, James Cheney, Amal Ahmed and Umut Acar. PPDP 2014
- Annotations are relative, Peter Buneman, Egor V. Kostylev, Stijn Vansummeren, ICDT 2013
- Requirements for Provenance on the Web, Paul Groth, Yolanda Gil, James Cheney and Simon Miles, International Journal of Digital Curation 8(1):39-56, 2012.
- Combining dependent annotations for relational algebra. Peter Buneman and Egor V. Kostylev, ICDT 2012
- Classification of annotation semirings over query containment. Egor V. Kostylev, Juan L. Reutter, András Z. Salamon, PODS 2012: 237-248
- Hierarchical models of provenance. Peter Buneman, James Cheney, and Egor Kostylev. TAPP 2012.
- The Database Wiki
project, Peter Buneman, James Cheney, Sam Lindley and
Heiko Mueller, SIGMOD Record Systems & Prototypes column, 2011
- Causality and the semantics of provenance, James Cheney. DCM 2010.
- A graph model of data and workflow provenance, Umut Acar, Peter Buneman, James Cheney, Jan Van den Bussche, Natalia Kwasnikowska, and Stijn Vansummeren. TAPP 2010.
- Provenance: A Future History, James Cheney, Steve Chong, Nate Foster, Margo Seltzer, Stijn Vansummeren. Onward! 2009.
- Provenance in databases: Why, where and how, J. Cheney, L. Chiticariu and W.-C. Tan. Foundations and Trends in Databases, 1(4):379-474, 2009.
- Curated Databases, Peter Buneman, James Cheney, Wang-Chiew Tan and Stijn Vansummeren. PODS 2008, p. 1-12.
- On the expressiveness of implicit provenance in query and update languages, P. Buneman, J. Cheney and S. Vansummeren. Transactions on Database Systems, 33(4):28 November 2008.
- On the Expressiveness of Implicit Provenance in Query and Update Languages, Peter Buneman, James Cheney, and Stijn Vansummeren. ICDT 2007, Volume 4353 of Lecture Notes in Computer Science, p. 209-223
- Recording Provenance for SQL Queries and Updates, Stijn Vansummeren and James Cheney. IEEE Data Engineering Bulletin, 30(4):29-37, December 2007.
- Program slicing and data provenance, James Cheney. IEEE Data Engineering Bulletin, 30(4):22-28, December 2007.
- Provenance Management in Curated Databases. Peter Buneman, Adriane Chapman, and James Cheney. SIGMOD 2006, p. 539 - 550.
Funding
This work is generously supported by the following research grants:
- DIACHRON: Managing the Evolution and Preservation of Web Data, an EU FP7 Integrating Project, 2013-2016
- SOCIAM: Social Machines, an EPSRC Programme Grant
- Google Research Awards, most recently "Language-Integrated Provenance"
- Skye: A programming language bridging theory and practice for scientific data curation, ERC Consolidator Grant
Provenance and Security
Provenance may enable interesting new capabilities, such as making the impact of changes to critical system configurations explicit, or support new kinds of history- or provenance-conscious security policies. However, pervasive collection of provenance also introduces new risks to security and privacy of users of such systems. Our research aims to understand the benefits and tame the risks, so that provenance can be used safely without compromising security or privacy.
People
- Paul Anderson
- James Cheney
- Roly Perera
- Wilmer Ricciotti
- Ghita Berrada
- Weili Fu
- Arthur Chan
Publications
- A core calculus for provenance inspection, Wilmer Ricciotti, PPDP 2017
- Strongly normalizing audited computation, Wilmer Ricciotti and James Cheney, CSL 2017.
- Expressiveness benchmarking for system-level provenance, Sheung Chi Chan, James Cheney, Ashish Gehani, Ripduman Sohan, and Hassaan Irshad. TaPP 2017.
- Causally consistent dynamic slicing, Roly Perera, Deepak Garg and James Cheney, CONCUR 2016.
- Proof-relevant π-calculus, Roly Perera and James Cheney, LFMTP 2015, p. 46-70.
- An analytical survey of provenance sanitization, James Cheney and Roly Perera, IPAW 2014.
- A core calculus for provenance, Umut A. Acar, Amal Ahmed, James Cheney and Roly Perera. Journal of Computer Security 21(6):919-969, 2013 (Special Issue on POST 2012)
- Functional Programs that Explain their Work, Roly Perera, Umut A. Acar, James Cheney, and Paul Blain Levy. ICFP 2012, p. 365-376.
- A core calculus for provenance, Umut A. Acar, Amal Ahmed, James Cheney and Roly Perera. POST 2012. Superseded by JCS article.
- Toward provenance-based security for configuration languages. Paul Anderson and James Cheney. TAPP 2012.
- Provenance as Dependency Analysis. James Cheney, Amal Ahmed and Umut Acar. Mathematical Structures in Computer Science 21(6): 1301-1337 (2011).
- A formal framework for provenance security, James Cheney. CSF 2011.
- Provenance as Dependency Analysis. James Cheney, Amal Ahmed and Umut Acar. DBPL 2007, p. 139-153. (arXiv) Superseded by MSCS article.
Funding
This work is generously supported by the following research grants:
- Provenance for configuration language security, Microsoft Research PhD Studentship
- Language-based provenance security, AFOSR EOARD research grant FA8655-13-1-3006.
- ADAPT: A Diagnostics Approach to Advanced Persistent Threat Detection, DARPA, 2015-2019
Provenance in Workflows and Scientific Computation
People
- James Cheney
- Malcolm Atkinson
- Paul Martin
- Alessandro Spinuso
- Eric Griffis (OSDC PIRE supported visitor)
Publications
- YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts, Timothy McPhillips, Tianhong Song, Tyler Kolisnik, Steve Aulenbach, Khalid Belhajjame, R. Kyle Bocinsky, Yang Cao, James Cheney, Fernando Chirigati, Saumen Dey, Juliana Freire, Christopher Jones, James Hanken, Keith W. Kintigh, Timothy A. Kohler, David Koop, James A. Macklin, Paolo Missier, Mark Schildhauer, Christopher Schwalm, Yaxing Wei, Mark Bieda, Bertram Ludäscher, International Journal of Digital Curation, 10(1):298-313, 2015.
- Semantics and Provenance for Processing Element Composition in Dispel Workflows, Eric Griffis, Paul Martin and James Cheney, WORKS 2013.
- Provenance for seismological processing pipelines in a distributed streaming workflow Alessandro Spinuso, James Cheney, Malcolm Atkinson, BigProv 2013