Datalog as a (non-logic) Programming Language
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
License
Abstract
Datalog is a restricted query language for databases that was invented in the 1980s. In recent years, Datalog has seen a renewed interest in academia as well as industry. In particular, they use Datalog in application domains, including network monitoring, distributed computing, smart contract validation and decompilation, and non-incremental as well as incremental static program analysis. Datalog is being used in these application domains for multiple reasons.
First, Datalog has a declarative least fixpoint semantics. Programmers specify a Datalog program and provide the input in the form of relations. Then, a Datalog engine derives the minimal model satisfying the Datalog program in the form of relations, which is called the least fixpoint. Datalog's least fixpoint semantics is declarative because programmers do not need to consider the required steps to find the least fixpoint. Second, there exist incremental algorithms for executing Datalog programs. Instead of repetitively recomputing the least fixpoint from scratch when the input changes, an incremental Datalog algorithm processes the change of the input to update the output. That is, they only update the affected output relations. Incrementality can lead to order-of-magnitude speedups of running time. Despite Datalog's strength, its low-level programming style is a major drawback. Datalog programs consist of horn clauses. A horn clause is an inference rule which derives tuples whenever a conjunction of predicates holds. Horn clauses do not always fit the computations a domain expert wants to express to tackle a problem. The data representation of Datalog, which are flat tuples, does not always fit the data that the domain operates on. These are representational gaps between the computations and data of the domain and horn clauses and flat tuples used by Datalog. These two gaps induce another gap when observing the behavior of Datalog programs, in particular, debugging Datalog programs. To debug Datalog programs, the state-of-the-art uses derivation trees as a an explanation vehicle, which fits well for inference rules. However, when experts express domain-specific computations operating on domain-specific data the visualization should fit the domain-specific computation and data instead. But how can we utilize Datalog's strengths while closing the representational gap between the domain and Datalog? In this dissertation, we propose to close the representational gap by providing abstractions for Datalog. In particular, we propose to use linguistic abstractions for Datalog computations, data abstractions for structured data, and operational abstractions for debugging programs. First, we develop a domain-specific language for type system specifications, which compiles to Datalog and optimizes the generated program to enable efficient incremental updates. That is, we generate incremental type checker implementations by utilizing incremental Datalog engines. Second, we develop a functional frontend for Datalog that combines general-purpose functional programming with fixpoint computations supporting abstractions such as higher-order relations, parametric relations, and first-class relations. We develop a data abstraction that encodes structured input data such as abstract syntax trees relationally. Our relational encoding focuses on enabling efficient incremental updates. At last, we develop a new view on debugging Datalog programs by developing a top-down stepping debugger for Datalog, which acts as an operational abstraction. Based on the top-down stepping debugger we allow lifting the intermediate state of the Datalog program to its compilation source. Hence, we provide a debugger interface for debugging programs of the functional Datalog frontend.