Projects

This page is still under construction.

The goal of the project is to explore one of the module’s topics in detail and gain hands-on experience with it.

Logistics

Projects should be done in groups of two, or alone.
Projects should be submitted as GitHub repositories. If two students work on a project, it should be clear from the commit history that both students contributed.

Project ideas

Every group is expected to independently work on a project related to the module’s focus. The projects can either be focused on the research or implementation aspect. Potential directions for high-level suggestions include the following:

Designing and implementing a new approach for testing, verifying, or debugging data-centric systems
Implementing a known approach in a new system
Conducting an empirical study, that is, measuring and analyzing the status quo to gain actionable insights (i.e., what can we learn from the study that allows us to implement better systems?)

Applying SQLancer to a New Database System (easy, 1 person)

The goal of this project is to add support for SQLancer to test a not-yet-supported database system. The project is easy and implementation-oriented, since SQLancer supports already many database systems and their structure can be copied. It is well suited for someone who wants to gain hands-on experience with a testing tool and have a direct real-world impact by finding and reporting bugs. The SQLancer documentation provides further instructions. For this project, success could be demonstrated by committing the code to the main SQLancer repository as well as by finding and reporting bugs.

Adding Test Case Reduction Support to SQLancer (easy, 1 person)

The goal of this project is to add an automatic reduction approach to SQLancer. The project is easy and implementation-oriented, as it is relatively clear what needs to be done, as discussed in an SQLancer issue. This project would be highly beneficial for many companies and organizations, as they currently mostly manually reduce test cases. Project success could be demonstrated by committing the code to the main SQLancer repository.

Automatically Generating Skeletons for SQLancer Implementations (medium, 1-2 persons)

The goal of this project is, given a (potentially annotated) grammar of a SQL dialect, to generate SQLancer Java classes that can be used to test the database system implementing the SQL dialect. For example, given the grammar of an INSERT statement, the skeleton of an SQLancer class that can generate such statements could be generated. It is difficult to completely automate the generation of the classes, as, for example, expected errors need to be manually examined and require interaction with the database system.

Creating and Analyzing a SQL Dataset (medium, 1-2 persons)

The goal of this project is to create and analyze a SQL dataset (or a dataset for another query language). SQL test cases (i.e., a series of SQL statements) could be automatically or manually extracted from an existing source such as the PostgreSQL mailing list. While this data-set would be valuable by itself (e.g., it could be used by mutational fuzzers or for evaluating various tools), it could subsequently be analyzed in various aspects (e.g., what are common statements?).

Benchmarking SQL Reducers (medium, 1-2 persons)

Various reducers exist that can be used to minimize SQL statements. The goal of this project is to create a benchmark suite of unminimized test cases and then use this suite to compare the performance of such reducers. The benchmark suite could consist of test cases extracted from bug trackers, or created by applying testing tools to (potentially historic) versions of database systems. Reducers that could be evaluated include C-Reduce, Perses, Reducer.sh, and SQLReduce.

Enhancing an Existing Project (easy to difficult, 1-2 persons)

Various projects within the scope of the module are hosted on platforms such as GitHub. They could be enhanced, studied, compared, or used for a new purpose. A non-comprehensive list of systems is given below.

APOLLO, to find performance bugs in database systems using differential testing.
Cosette, an automated SQL solver.
Cynthia, to find bugs in ORMs using differential testing.
DiffStream, a differential testing tool for Apache Fink.
EvoSQL, a search-based tool that generates test data for SQL queries.
Grand, a testing tool that finds bugs in Gremlin-based graph database systems.
Jepsen, a testing tool to find isolation-level bugs in database systems.
sqlbench, a benchmarking tool for PostgreSQL.
sqlcheck, a tool that detects SQL anti-patterns.
SQLFluff, a linter and auto-formatter for SQL.
SQLancer, a tool to find logic bugs in database systems.
sqlparse, a SQL parser.
SQLright, a coverage-guided testing tool to find logic bugs in database systems.
SQLsmith, a random SQL query generator.
Squirrel, a fuzzer for database management systems.
QueryFuzz, a metamorphic testing tool for Datalog engines.

Designing and Evaluating a New Approach to Testing/Debugging/… (difficult, 1-2 persons)

The goal of this project is to design and evaluate a novel approach related to the module’s theme. The project will be graded on the quality of the attempt (e.g., is the approach interesting in the sense of providing a new insight? are the implementation and evaluation reasonable?) rather than the result (e.g., does a testing approach find new bugs?). One possible way to find inspiration for, for example, a new testing approach is to select a class of systems (e.g., geographical extensions of relational database systems, streaming database systems, data manipulation libraries, …), study the bug trackers of representative projects, and develop a new insight that can be used for testing.