Recognizing regular patterns in heterogeneous sequences
Jim Newton
Run-time type-based reflection is a powerful tool which is used to solve certain problems which are out of reach to purely statically typed programming languages. The JVM-based implementation Scala allows a running program to maker certain type-based decisions which cannot be made at compile time. Notably, pattern matching and dynamic method dispatch allow the running Scala program to make run-time decisions based on characteristics of input data, the type of which cannot be known at compile time. In this expose, we present yet another kind of run-time based type decision which allows us to arbitrate among various regular patterns in otherwise untyped data. We call these patterns RTEs (regular type expressions).
The Scala type system allows us to specify sequences such as Seq[Int]or Seq[String], i.e., homogeneous sequences of elements of a specified type. We would additionally like to specify sequences of heterogeneous but regular types. Regular types cannot be statically supported by the Scala type system. However, we have implemented this abstraction the help of run-time reflection. We assume the audience to already be familiar with string-based regular expressions (REs). REs are used to distinguish strings which follow a regular pattern such as $a(a|b)^*b$, the set of strings beginning with the character a, ending with b, and with zero or more a or b (or both) characters falling in between.
We generalize this familiar concept to define expressions which specify a sequence beginning with an integer, ending with a string, and with zero or more integers or strings (or both)falling in between.`val Int:Rte = Singleton(classOf[Int])``val String:Rte = Singleton(classOf[String])`val r:Rte = Int ++ (Int | String).* ++ StringThe implementation of Regular Type Expressions (RTEs) in Scala involves several libraries.
There are several questions addressing practical concerns such asperformance and memory consumption. Additionally, there are many theoretical questions which investigate the limitations of the generalization from classical character based regular expressions to regular type expressions. Some of these concerns include habitation and vacuity checks. Given two types determine whether one is a subtype of the other, and whether either is empty. Subtype determination is important for guaranteeing that finite automata be deterministic. Unfortunately, the subtype relation cannot always be determined (for several interesting theoretical reasons). We present a clever procedure for DFA construction which is guaranteed to be deterministic, even when the subtype relation cannot be determined. We expect some audience members will find this abstraction thought-provoking. However, we fully suspect others will find the concept objectionable as in many cases we are fighting to undo some of the guarantees which the Scala compiler endeavors to provide.