Design, Verification, and Implementation of Cloud Database Transaction Systems
Modern web applications are typically layered on top of a high-performance cloud database running in a partitioned, geo-replicated environment for system scalability, data availability, and fault tolerance. As network partitions/failures (P) are unavoidable in general, a cloud database design must sacrifice either strong data consistency (C) or high data availability (A) according to the CAP theorem. To balance well the trade-off between C and A, there is therefore a plethora of data consistency models for cloud database transaction systems, from weak models such as read committed through various forms of snapshot isolation to the strongest guarantee strict serializability.
We are interested in designing provably correct (C) and predictably high-performance (A) cloud database transaction systems, as well as the pass from those designs to the actual system implementations. Recent examples include a new design of read atomic transaction algorithm that outperforms the state of the art.
We are also interested in verifying and optimizing existing distributed transaction systems from both qualitative (C) and quantitative (A) aspects. Recent studies have identified data consistency anomalies in well-designed, heavily-tested production database systems such as Galera.
We offer Bachelors and Masters projects to motivated students who are interested in either formal-methods-aided design and verification of cloud database systems, their implementation, and performance evaluation, or the mixture of both.
Possible projects include:
- Design provably correct and predictably high-performance distributed transaction algorithms with respect to different data consistency guarantees.
- Pass from formal designs to system implementations.
- Verify cloud database correctness; in particular, the data consistency properties.
- Test and/or verify database-backed web/mobile applications.
- Develop and implement techniques not only to detect violations, but also to identify the cause of violations.
Prerequisites for student projects
- For projects involving formal efforts:
- Strong skills in functional programming (e.g., from the FMFP course).
- Familiarity with logics (e.g., temporal logics) and some familiarity with at least one of the formalisms such as Rewriting Logic/Maude and Isabelle/HOL (e.g., from self study). Note that experience with Tamarin would help.
- For projects involving system efforts:
- Strong skills in programming (e.g., C++, Java, Python, Go, …).
- Familiarity with distributed systems in general and databases (and concurrency) in particular.
Proposed Theses
Masters
- Performance-Optimal Distributed Read Atomic Transactions
Further Information
Please contact Si Liu for more information.