Static analysis over tree-structured data using graph decompositions
In database theory, static analysis is a common term for decision
problems related to reasoning about syntactic components of database
systems: queries, integrity constraints, etc. In the classical setting
of relational databases, these problems usually amount to the
satisfiability problem for fragments of first order logic, or their
mild extensions like Datalog, which extends positive-existential
formulas with the fixpoint operator. For tree-structured data,
modelled as data trees whose nodes have a label from a finite alphabet
and a value from an infinite domain, an additional complication is the
presence of the schema, usually abstracted as a tree automaton, which
describes the allowed shapes of trees. The interplay between schemas
and even the simplest queries makes static analysis rather
challenging; the most studied problems like query containment and
constraint entailment are often undecidable. Two common approaches to
solving these problems are either to bound the number of values that
need to be considered and reduce to the emptiness problem for
classical tree automata, or to design new automata models that are
able to manipulate data values but still have decidable emptiness. A
different approach leads via graph decompositions and variants of the
Courcelle's theorem. As it turns out, the right notion of graph
decomposition in this context is the one leading to the notion of
clique-width. In several static analysis problems one can show that it
is sufficient to consider data trees of bounded clique-width, and
obtain decidability by expressing the problem in monadic second order
logic, which by the Courcelle's theorem is decidable over such
structures. Tighter complexity bounds can be obtained by
specializations of the Courcelle's theorem to less expressive logics.