# Research topics

Topics for PhD Research at the Laboratory for Foundations of Computer Science

Suggested topics for PhD research are collected here. They are grouped into the following areas:

- Algorithms and Complexity
- Databases
- Logic and verification
- Process algebras, modeling, and systems biology
- Programming languages and semantics
- Security
- Software Engineering

Some projects bridge multiple areas.

Other PhD topics may be available by arrangement with potential supervisors. Browse the list of LFCS staff to see possible supervisors and their research interests.

## Algorithms and Complexity

### Measuring, Counting and Sampling in Polytopes

**Potential Supervisor:** Mary Cryan

The project proposes to study some algorithmic and combinatorial questions regarding a special class of high-dimensional polytopes called Transportation polytopes - we will be interested in the diameter of these polytopes, the (number of, and shape of) vertices of the polytopes, and the number of lattice points inside these polytopes (the lattice points are also known as contingency tables). In recent years, there has been a bustle of activity concerning the diameter of general multi-dimensional polytopes - the big result being Francisco Santos' counterexample disproving the "Hirsch conjecture". Despite this, the question of the diameter of special classes of polytopes remains open, and there is a possibility (for example) that the class of transportation polytopes does satisfy that conjecture.

That would be our starting point for the research.

Apart from diameter the question of counting/sampling contingency tables remains open for the general case, and we would plan to spend some time re-examining that problem. In that case our concern is in obtaining polynomial-time algorithms. We may alternatively spend time considering different counting and sampling questions. Our early results will influence the later direction of the project.

### Algorithms for analyzing infinite-state Markov chains, Markov Decision Processes and stochastic games

**Potential supervisor: **Kousha Etessami

In recent years a body of work has been developed on algorithms for, and the computational complexity of, analyzing and model checking some important families of (finitely presented) infinite-state Markov chains, Markov Decision Processes (MDPs) and stochastic games. These models arise by adding natural forms of recursion, branching, or a counter, to finite-state models, and they correspond to probabilistic/controlled/game versions of some class automata-theoretic models. They also subsume some classic stochastic processes such as multi-type branching processes and quasi-birth-death processes, and they provide natural abstract models of probabilistic procedural programs with recursion.

Some of the key algorithmic advances for analyzing these models have come from algorithms for computing the least fixed point (and greatest fixed point) solution for corresponding monotone systems of nonlinear (min/max)-polynomial equations. Such equations provide, for example, the Bellman optimality equations for optimal extinction and reachability probabilities for branching MDPs (BMDPs). A key role in these algorithms is played by Newton’s method, and by a generalization of Newton’s method which is applicable to the Bellman equations for BMDPs, and which uses linear programming in each iteration.

By now, polynomial time algorithms have been developed for some of the key problems in this domain, while other problems have been shown to have high complexity, or to even be undecidable. Yet many key algorithmic questions about these models remain open. We are interested in developing efficient algorithms for, and understanding the computational complexity of these problems.

**Algorithmic Game Theory and Equilibrium computation**

**Potential supervisor: **Kousha Etessami

Game theoretic tools and algorithms play an important role in understanding interactions between self-interested agents (both human and automated) in various settings, including the internet (such as in online advertising auctions). Game theory is also more broadly an important tool in many areas of computer science. A key problem in algorithmic game theory is computation of an equilibrium for a given game or market model, including Nash equilibria and other forms of equilibria, in various classes of games and markets. Some challenging algorithms/complexity questions we are interested in are related to the existence of efficient algorithms for computing/approximating equilibria in various such game/market settings.

**Solving PSPACE-complete Problems in Automata Theory**

**Potential supervisor:** Richard Mayr

Develop algorithms and tools for solving most practical instances of PSPACE-complete problems in automata theory, e.g., language inclusion, equivalence and universality (see www.languageinclusion.org).

Finite automata are a fundamental and ubiquitous building block in many areas of computer science. They can describe and manipulate infinite (but regular) sets of objects like finite and infinite length words, and trees with finite and infinite depth. Applications include reasoning about patterns in database queries, modeling and simulation of controllers and reactive systems, analyzing DNA sequences, model checking and formal software verification, natural language processing, learning theory, and algorithmic decision procedures for logical theories. Central automata problems like language inclusion, equivalence and universality are PSPACE-complete. However, such complexity bounds on the worst-case scenario are not necessarily an obstacle to practical applications. In recent years, advanced methods have been developed which can solve many practical instances of other computationally hard problems, notably the satisfiability problem for boolean formulae (SAT). The SAT problem is complete for the complexity class NP (which is hard but still easier than PSPACE), and modern tools can solve highly non-trivial cases arising in practical applications, e.g., hardware verification. More recently, similar progress is being made on PSPACE-complete problems (see www.languageinclusion.org). The vision is to develop methods to solve PSPACE-complete automata theoretic problems for automata with 10000-1000000 states. **more**

**Algorithms and Geometry**

**Potential supervisor:** Rik Sarkar

Geometric datasets are becoming common with popularity of GPS and other location services. Many types of data are geometric with embedding in a Euclidean space such as the plane or 3D space.

We want to develop efficient algorithms to process and answer questions about geometric data, particularly in dynamic scenarios when the dataset is updated frequently or contains noisy, uncertain information.

**Networks and mobility**

**Potential supervisor: **Rik Sarkar

The world is becoming more interconnected with phones, GPS, e-readers and tablets connected into networks. Our social and professional lives are now parts of complex networks covering the globe.

Our goal is to develop algorithms to process and store information in large networks. We also want to build methods to model and analyze complex networks. This will help our understanding of their structure and best ways to use them. These questions are relevant to several current research topics such as wireless and sensor networks, social networks and big datasets. Network data processing and analysis has interesting variants when some parts of the network are mobile -- for example, phones and tablets.

This is a broad area and the topics can be approached in different ways: theoretically through algorithms, combinatorics and mathematics; or experimentally by evaluating through programs and simulations.

**Decision procedures for infinite state systems**

**Potential supervisor:** Colin Stirling

The aim is to extend techniques of model checking and equivalence checking from finite to infinite state systems.

Infinite state systems include pushdown automata, systems with data, systems with binding, and so on. There are many significant open technical problems (such as: is equivalence of higher-order schema decidable?) and many open conceptual issues (such as: is there a good notion of tree automaton that can recognize families of higher-order terms?).

**Quantum Complexity**

**Potential supervisor:** Elham Kashefi

How entanglement determines the power of quantum and classical multi-prover interactive proof systems is a long-standing open question. Based on a novel approach connecting blind quantum computing protocol with interactive proof systems, it is proven that a quantum verifier is no more powerful than a classical verifier even in the multi-party scenario. The main goal of the project is to explore further this approach to fully characterise the power of entangled provers with classical verifier, building towards a quantum PCP theorem. Another central concept in complexity theory with a direct application in cryptography is that of a zero-knowledge protocol where a prover wants to convince a verifier about the truth of a statement without revealing any additional information about the statement. In practice, zero-knowledge protocols are used as primitives in larger cryptographic protocols in order to limit the power of malicious parties to disrupt the security of the larger protocol. However, constructing classical and quantum interactive proof systems that are zero knowledge against a quantum attack remains a challenging task. A second goal of the project is to exploit the structural link between blind quantum computing and interactive proof systems to develop general schemes for constructing zero knowledge protocols by exploiting blindness properties to hide the information available to malicious parties in an interactive system.

## Databases

### Approximate Query Answering

**Supervisor:** Wenfei Fan

Querying big data is cost-prohibitive: a linear scan of a big dataset with a fast solid state device easily takes days or even years, and real-life queries are far more expensive than linear time. In light of this, it is often beyond reach to compute exact answers to our queries in big data. This suggests that we find approximate answers to our queries, to strike a balance between the accuracy and the cost.

We advocate two approaches to approximate query answering. One is by query-driven approximation, to use "cheaper" queries whenever possible. The other is a data-driven approximation scheme, to answer queries by accessing a small fraction of a dataset.

Each approach calls for unconventional techniques and fundamental research, for different query classes in various settings.

This project aims to explore these approaches to approximately querying big data, from theory to practice.

**Databases and verification**

**Potential supervisor:** Leonid Libkin

The basic settings of the fields of database querying and verification/model-checking are very similar: in both cases one evaluates a logical formula on a finite structure (a query on a relational database, or a temporal formula on a Kripke structure). Both fields have invested heavily in developing logical formalisms and efficient algorithms for query evaluation and model checking, but despite this, there are very few direct connections between them (for example, we now know how temporal logics and navigation in XML documents are related). There are many more potential connections that have not received the attention they deserve: e.g., between streaming XML and verification of pushdown specifications or between database querying and verifying data-driven web services. One of my goals is to study these connections, with the ultimate goal of bringing the fields closer together. I am interested in students with good theoretical background (logic, automata, algorithms); knowledge of databases and verification is obviously a plus.

**Graph Databases**

**Potential supervisors:** Wenfei Fan and Leonid Libkin

Research on foundational, algorithmic, and systems aspects of graph databases.

The last fifteen years have seen an unprecedented accumulation of data by applications going well beyond traditional databases. These days we are witnessing the ever increasing demand for a new data format: graph-structured data. The need for processing graph data is particularly important in applications such as social networks and Semantic Web, where the underlying data are naturally modelled as a graph. Other applications of graph data appear in biology, cheminformatics, knowledge discovery, network traffic, computer vision, intelligence analysis and crime detection.

We have several projects related to the study of graph databases. Our objective is to provide foundations, practical techniques and a toolbox for querying graph data, independent of individual applications and readily applicable in all of the main tasks for querying graph data. Specifically we concentrate on the design and analysis of query languages for graph data, on algorithms for efficient query evaluation over large and/or distributed graphs, and on designing and building tools for providing approximate query answering over graphs. We are looking for PhD students with strong background in both theoretical CS (for the foundational tasks) and systems aspects (for the design and evaluation of query processing techniques).

**Incompleteness of relational, XML, and graph databases**

**Potential supervisor:** Leonid Libkin

Incomplete data is ubiquitous and poses even more problems than before. The more data we accumulate, the more instances of incompleteness we have. And yet the subject is poorly handled by both practice and theory. Many queries for which students get full marks in their undergraduate courses will not work correctly in the presence of incomplete data, but these ways of evaluating queries are cast in stone (SQL standard). We have many theoretical results on handling incomplete data but they are, by and large, about showing high complexity bounds, and thus are often dismissed by practitioners. Even worse, we have a basic theoretical notion of what it means to answer queries over incomplete data, and yet this is not at all what practical systems do.

The situation looked quite bleak for many years but recently progress has been made that opened totally new avenues of attack. We have elements of a basic theory that apply to the majority of existing data models, but we need to understand how to specialize them and make effective use of them to produce query answers not only in a pure database environment but also in applications such as data integration, data exchange, consistent query answering, and answering queries in the presence of ontologies.

The project brings together tools and techniques from data management, AI, algorithms, programming semantics, and mathematical logic.

### Querying Big Data

**Supervisor: **Wenfei Fan

Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are "tractable'' on big data? Can we efficiently query big data simply by adding more processors? How can we make queries tractable on big data? When exact answers are beyond reach in big data, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers? To get sensible query answers in big data, does it suffice to simply provide a scalable algorithm for query evaluation?

This project aims to answer these questions. We will develop a theory of BD-tractability, to characterize queries that are tractable on big data. We will develop practical techniques to query big data, and evaluate their effectiveness using real-life data from industry. In addition to techniques for coping with the quantity of the data, we will also develop practical methods for improving data quality, the other side of big data.

## Logic and Verification

**Combinations and Abstractions of Formal Games**

**Potential supervisor: **Perdita Stevens

Formal two-player games have a long history in computer science and outside. They have been applied in the verification of systems, where the players aim respectively to demonstrate a flaw in a system and to demonstrate that the system works as specified. In order to make calculations with such games feasible, it is essential to be able to use abstractions of the games: for example, one considers a large (perhaps infinite) class of game positions all together, until such time as distinctions between the positions force one to treat parts of the class differently. There is a considerable body of such work, which underpins some impressive tools. However, the field still lacks a principled, clean account of how games and their abstractions may be combined. This project should develop such an account.

**Formal Verification Methods for Quantum Systems**

**Potential supervisors:** Elham Kashefi and Richard Mayr

Develop temporal logics and model checking techniques for quantum systems

It has been predicted that any large-scale overhaul of information science in the 21st century will have to include quantum information. The last decade has witnessed a quantum leap in both design and implementation of quantum protocols, which rely on various fundamental properties of quantum mechanics for their performance, security and correctness. The increasing complexity of these advanced quantum protocols demands a unified high-level approach to their formal verification, far beyond the currently available ad-hoc schemes based on matrix calculus.

The overall goal of the project is to develop temporal logics and model checking techniques for quantum systems, similarly to the development in the classical world. However the existing classical methods have to be properly extended to address fundamentally different concepts of quantum systems, such as entanglement which will affect the structure of the agent's local view in a protocol. The first approach would be based on characterising the class of quantum properties that can be abstracted into similar classical properties to be verified using classical model checking. This will pave the road to develop the required logic which can explicitly express unique aspects of

the dynamics of quantum systems, and to develop model checking algorithms for it. The general aim is to bring logic-based formal reasoning techniques into the domain of quantum computation.

**Nominal logic, automated reasoning and type theory**

**Potential supervisor:** James Cheney

Nominal abstract syntax is a recent and fruitful approach to formally defining syntax modulo alpha-equivalence. Nominal techniques are supported in the Nominal Isabelle system and some other tools (such as a logic programming language alphaProlog), which can (and have) been used to verify substantial systems, including dependent type theories such as LF and practical languages such as XQuery. I'd be interested in supervising research in any area related to nominal logic, type theory or mechanized metatheory. This could involve focusing on formalization of existing systems of interest, identifying opportunities for adapting automated reasoning (such as proof planning) to the nominal setting, or developing new, more constructive formalisms that could provide a foundation for adoption of nominal techniques in type-theoretic proof assistants such as Coq. **more**

## Process algebras, modeling, and systems biology

**A Rule-Based Formalism for Contact + Locality in Computational Systems Biology**

**Potential supervisors:** Gordon Plotkin and Nicolas Oury

The project is to investigate stochastic rule-based formalisms for computational systems biology that allow multilevel cellular systems that permit bonding at all levels, from cells in tissue to proteins in complexes.

Click here for details.

**Quantitative Modelling Process Algebra**

**Potential supervisors:** Jane Hillston and Stephen Gilmore

Stochastic process algebras, providing high-level descriptions of dynamic systems, amenable to rigorous mathematical analysis through formal semantics.

The PEPA project began in Edinburgh in 1991 and has developed a modelling language and associated tools to predict the performance of software and hardware systems. The PEPA language (Performance Evaluation Process Algebra) is a compact modelling language which models systems as compositions of sequential components which performance timed activities either individually or in cooperation with other components. PEPA models are analysed by compilation into Continuous-time Markov Chains (CTMCs) or other mathematical structures. **more**

## Programming languages and semantics

**Data-centric programming and provenance**

Potential supervisors: James Cheney, Stratis Viglas and Peter Buneman

Provenance means metadata/trace information about a computation. Many scientific disciplines are increasingly data-driven and require support for provenance to make it possible to safely share and reuse raw data or processed results. Our recent focus has been on defining models of provenance that provide useful information for debugging, auditing or assurance or scientific results based on complex computations, and on implementing specific techniques addressing unmet needs for scientific data curation. Our emphasis has been on foundations and applications of ideas from programming languages and semantics to understanding provenance. PhD projects in this area could focus on either the foundations or implementation and evaluation of provenance techniques or on a combination of both. Research topics could include:

- Types and language design for integrating multiple data-centric programming models (e.g. language-integrated query)
- Extending bidirectional programming for synchronizing data across data models
- Language-based techniques for data curation and preservation, provenance tracking, or archiving
- Query and update techniques for longitudinal or provenance-aware queries.
- Extending and evaluating a "database wiki" system for making it easier to collaborate on scientific data resources while fully tracking the provenance and change history of the data.

Other projects, building on interests in the Database Group in data annotation, provenance, archiving, citation, and curation, or interests in the Data-Intensive Research Group in provenance for scientific workflows, are also possible.

Depending on the emphasis of the project, this project would be suitable for a student with a solid grounding in at least one of database systems, database theory or programming languages, and (for more systems-oriented work) some experience and interest in implementation and empirical evaluation.

**From Session Types to Data Types: A Basis for Concurrency and Distribution (ABCD)**

**Supervisor:** Philip Wadler

Concurrency and distribution are computing's most pressing problem. The data type is one of computing's most successful ideas. Session types codify the structure of communication, making software more reliable and easier to construct. Edinburgh, joint with Glasgow and Imperial, has an EPSRC Programme Grant to explore session types. Because of overlap in topic, PhD students may be funded through the Pervasive Parallelism CDT.

http://groups.inf.ed.ac.uk/abcd/

http://pervasiveparallelism.inf.ed.ac.uk/

**Integrating statically and dynamically typed languages**

**Supervisors: **Philip Wadler and Andrew D. Gordon

Both Meijer and Bracha argue in favor of mixing dynamic and static typing, and such mixing is now supported in Microsoft's .NET framework. Much recent work has focused on integrating dynamic and static typing using the contracts of Findler and Felleisen, including the gradual types of Siek and Taha, the hybrid types of Flanagan, and the manifest contracts of Greenberg, Pierce, and Weirich. The blame calculus unifies the above approaches, permitting one to integrate several strengths of type system: dynamically typed languages, Hindley-Milner typed languages, and refinement types. We are interested to define a 'wide-spectrum' programming language which permits the programmer a range of styles, from dynamically typed (as in Scheme or Javascript) through Hindley-Milner types (as in ML or Haskell) to refinement types (as in F* and RCF) and dependent types (as in Coq or Agda).

**Language-based provenance security**

**Potential supervisor:** James Cheney

Provenance means metadata/trace information about a computation. Many scientific disciplines are increasingly data-driven and require support for provenance to make it possible to safely share and reuse raw data or processed results. There are many intriguing interactions between provenance and classical problems in programming languages and security. This includes language-based security (information flow), incremental or bidirectional computation, and software engineering (program slicing) for provenance-tracking. A number of projects in this area are possible, including:

- Enriching and extending formal models of provenance from simple programming languages or database query languages to handle features such as concurrency, references, side-effects, notions of location, time, or boundaries of control
- Analysing existing proposals for provenance security mechanisms, and identify shortcomings or generalisations leading to a richer understanding of policies and correct mechanisms for provenance
- Developing efficient techniques for integrating provenance-tracking techniques into programming languages or other frameworks
- Formalising and verify provenance techniques in mechanised proof systems or dependently-typed languages such as Coq, Agda or Isabelle/HOL

This project would be suitable for a student with some background in programming languages, semantics, or security, and interest in applications to system configuration and other industrially-relevant problems.

**For more information see our recent publications here.**

## Security

**Quantum Cryptography**

**Potential supervisor:** Elham Kashefi

Analysing and exploiting specific features of different computational models, from a perspective that combines physics and computer science, provides a framework for designing new cryptographic protocols. In this project, we will employ the universal blind computation as a building block to design new classes of protocols for Quantum Money, Quantum Digital Signature and various Secure Circuit Evaluation schemes. The security of all these new protocols will be rigorously studied under specific conditions capturing the essential features of practical physical implementations. We will pursue a threshold analysis for achieving computational or information theoretic security, and will also consider the case of bounded resources.

**Secure Quantum-Classical Multi Party Computation**

**Supervisors:** Elham Kashefi and Petros Wallden

The field of secure multiparty computation (SMPC) was founded in the 80’s by the celebrated protocol of Yao that defined the pillar of cryptography to date. However the overhead of some of these protocols have made them impractical so far. On the other hand we have recently witnessed the immense success and rapid expansion of the field of delegated quantum computing.

We aim in this project to bridge between the two fields to develop various hybrid quantum-classical secure multiparty computation protocols where security and potentially practicality of the original classical protocols are boosted. At the same time we would be able to address the very question of secure multiparty quantum computation, a novel route to be explored to go beyond the quantum key distribution as the main activities in the field of quantum cryptography.

Our proposed schemes will be based on the protocol of blind quantum computing and the related verification scheme. These protocols could be adapted to present a quantum analogue of the Yao protocol while providing potentially a more secure scheme.

Furthermore the first proof-of-principle implementations for these quantum protocols already exists that supports our global ambition of making the hybrid scheme more practical as well.

## Software engineering

**Bidirectional Model Transformations**

**Potential supervisor:** Perdita Stevens

A bidirectional model transformation (bx) is represented as a single text which both specifies a consistency relation on two (or more) models, and specifies how consistency should be restored when one of the models is changed so as no longer to be consistent with the other. The field of bidirectional model transformation languages is immature, and fundamental problems remain to be resolved. Many PhD projects are possible, for example, relating to: formalising desirable properties of bx such as "Least Surprise", that the consistency restoration process changes no more than it needs to; the use of traceability information about how parts of the models relate; the design of bx languages; semantics for such languages; specification and verification.

**Complexity Metrics for Testing Concurrent programs**

**Potential supervisor:** Ajitha Rajan

Developing concurrent programs to a high standard of reliability is notoriously difficult due to the large number of interactions between components executing in parallel.

Testing and automated source code analysis aid the developer in finding bugs and ensuring correctness. However, it is infeasible to analyse the entire program in detail since even for small concurrent programs the number of interactions and execution paths quickly become intractable. It, therefore, becomes important to identify parts of the program that are especially error prone, and focus testing and analysis efforts on these parts. For sequential programs, software complexity metrics help identify the error prone parts. For concurrent software, this problem has not received enough attention.

In this project we will first devise suitable complexity metrics for concurrent programs, and then apply them to increase the efficiency of automated software analysis and testing.

**Testing model transformations**

**Potential supervisor:** Perdita Stevens

Develop some aspect of the theory of testing model transformations.

Model transformations are a key element of model-driven approaches to developing software. However, they are hard to test, e.g. because their typical inputs and outputs are complex enough that it is hard to construct many test cases and check outputs without support. Many different languages are in use for writing model transformations, some of which support bi-directionality, which may add further challenge to the testing process. This project should first survey the field of testing model transformations and then pick some area of testing theory to contribute to. For example, mutation testing of transformations might be interesting; or one might develop a theory of how to construct a range of input models which would be as simple as possible, and investigate what assurange could be gained in this way. It might be appropriate for the project to involve developing a tool, but the main focus should be theoretical.