← Lambda Land · Blog


Evolving Languages Faster with Type Tailoring

15 July 2024

Programming languages are too slow! I’m not talking about execution speed—I’m talking about evolution speed. Programmers are always building new libraries and embedded DSLs, but the host programming language—particularly its type system—doesn’t understand the domain-specific aspects of these things.

Example problem: my type system doesn’t speak PCRE

Consider regular expressions—most programmers would understand that a regular expression like [a-z]+([0-9][0-9]), if it matches, will capture two digits in the first capture group. If I try to write this code in Rust or Typed Racket, the type checker complains:

use regex::Regex;

fn main() {
    let re = Regex::new(r"[a-z]+([0-9][0-9])").unwrap();
    if let Some(caps) = re.captures("dent42") {
        println!("id number: {}", caps.get(1));
    }
}

error:

rustc [E0277]: `Option<regex::Match<'_>>` doesn't implement `std::fmt::Display`
the trait `std::fmt::Display` is not implemented for `Option<regex::Match<'_>>`
#lang typed/racket

(define (user-idnum (username : String)) : Number
  (define re "[a-z]+([0-9][0-9])")
  (define m (regexp-match re username))
  (if m
      (string->number (second m))
      (error "bad username")))

(printf "id number: ~a\n" (user-idnum "dent42"))

error:

example.rkt:7:22: Type Checker: Polymorphic function `second' could not be applied to arguments:
Types: (List* a r (Listof t))  -> (r : ((! (cadr (0 0)) False) | (: (cadr (0 0)) False)) : (cadr (0 0)))
       (Listof a)  -> a
Arguments: (Pairof String (Listof (U False String)))
Expected result: String

The (U τ₁ τ₂) syntax is a type union of types τ₁ and τ₂, whatever types those are. The equivalent of Option<String> in Typed Racket is (U String False).

The problem is that getting the first capture group (caps.get(1) in Rust, (second m) in Typed Racket) returns an optional type (Option<regex::Match> in Rust, (Listof (U False String)). The thing is, we know that since the regex match succeeded on line 5, caps.get(1) (or (second m)) should definitely succeed because there was one capture group in the regex we matched. Instead, we’re forced to unwrap in Rust:

println!("id number: {}", caps.get(1).unwrap().as_str());

Likewise, in Typed Racket, we have to insert some casts and checks manually to convince the type checker that this code should run.

This is a small example; the key point is this: type systems usually aren’t smart enough to know about the structure of regexes—all the compiler sees are opaque strings. This goes for things beyond regular expressions. Consider SQL queries, which are so often embedded as strings in languages: when a query goes wrong, you usually only find out at runtime.

What would a solution look like? Some people have tried making a new way of writing regexes with composable objects that the type system has a better chance of understanding; that is a neat approach, but that both requires me to rewrite my program and doesn’t solve the issue of indexing into lists of known size (the results of a capture) more efficiently.

Another option would be to make the type system smarter. But that too has its drawbacks: language implementers are often leery of tinkering with the type checker—and rightly so! So much relies on the type checker being correct. Moreover, even if you did make the type checker able to understand regexes, someone’s going to build a new library tomorrow that will stump your type checker just as before.

Here’s a more attractive option: we can use the metaprogramming tools the language provides to teach the type system some new tricks. This is something that end-users of languages can do without waiting for the language designer. We call this type tailoring.

Type Tailoring is the title and subject of a paper I wrote with my advisor Ben Greenman and our coauthors Stephen Chang and Matthias Felleisen. It has been accepted at European Conference on Object-Oriented Programming (ECOOP).1 You can get a preprint here.

Sketch of a solution

Here’s a high-level sketch of how we would solve the problem:

  1. Something would notice that the regex has a single capture group.
  2. The re.captures function would get this information and update the its type.
  3. This information would further by leveraged by the type of caps, to indicate that get(0) or get(1) will always succeed.

Some years ago, my advisor made the trivial library for Typed Racket. It can tailor the following code so that it typechecks and runs efficiently:

The trivial library is available as a Racket package. If you have Racket installed on your system, run raco pkg add trivial to install it.

#lang typed/racket

(require trivial trivial/list)  ;; add this to tailor program

(define (user-idnum (username : String)) : Number
  (define re "[a-z]+([0-9][0-9])")
  (define m (regexp-match re username))
  (if m
      (string->number (second m))
      (error "bad username")))

(printf "id number: ~a\n" (user-idnum "dent42"))
Type Checker: Polymorphic function `second' could not be applied to arguments:
Types: (List* a r (Listof t))  -> (r : ((! (cadr (0 0)) False) | (: (cadr (0 0)) False)) : (cadr (0 0)))
       (Listof a)  -> a
Arguments: (Pairof String (Listof (U False String)))
Expected result: String
id number: 42

The problem is that, like Rust, Typed Racket must assign an overly conservative type to the result of matching a regular expression. Consequently, the programmer has to insert casts. The trivial library can analyze Typed Racket and insert these casts and checks automatically. The end-result for the user is that this code Just Works™ as you would expect.

Notice that the trivial library is a library—it doesn’t require modifications to the compiler or type checker or anything. This means that normal users of programming languages can create their own tailorings without breaking the compiler or messing with the build pipeline.

Supporting type tailoring

What do you need to make type tailoring work? Let’s step back a second and look at what we need to do in the first place. Our problem is that the type checker doesn’t know as much about our program as we do. What we can do to teach the type checker is program the elaboration step: surface syntax typically doesn’t have type annotations at every point; elaboration takes the syntax that a programmer writes, and adds types and type holes wherever needed. This elaborated syntax gets sent off to the constraint solver for type checking and inference.

How do we program the elaboration step? Almost all languages that have macros do type checking after macroexpansion. This is crucial for type tailoring. We can write macros that add checks, casts, type annotations, or whatever else we need to make the type checker happy.

Here are the key features that you must have to make type tailoring work:

Type checking after elaboration
Type checking must come after elaboration to check the results of tailoring. Without this, it would be too easy to break the type system. Furthermore, if type checking comes after elaboration, we can leverage all the power of the type checker to do the heavy-lifting for us; all a tailoring has to do is give a few hints to the type checker here and there.
Elaboration-time computation
Most of the time this means that you need procedural macros. Pattern-based macros (such as syntax-case from Scheme or macro_rules! from Rust) can only rearrange syntax, in a pattern → pattern transformation, and can’t perform arbitrary rewrites.
AST datatype
Without an AST datatype, tailorings are limited to using a token stream.2 Sometimes it’s possible to convert a token stream to an AST, but you loose metadata and becomes unwieldy quickly.

These are the essential elements, without which tailoring can’t happen. Besides these three things, you also will want some of the following tailoring features:

Hygienic macros
Hygienic macros avoid the variable capture problem.3 In other words, I shouldn’t have to be concerned about the internals of the macros that I use. This also makes it so that I can compose macros with each other.
Metadata
Metaprogramming systems that can attach metadata directly to AST nodes can share information between different tailorings easily. (Keeping compile-time state off to the side is an alternative.)
Controlling the order of expansion
Tailorings that cooperate often need a way to control the order in which they run: one tailoring might depend on the results of another, and a third tailoring might analyze the output further.
Accessing external data
Some of the coolest tailorings reached out to external sources of data to augment type checking. Rust actually has a neat library called SQLx that, at compile time, checks SQL strings against the schema of a database. There are several systems that do something similar.
Type information
A few of the systems that we looked at (Idris 1 and Scala 3) could inspect the types of arguments to macros. After expansion, the type checker would run again to check that the transformation’s result was well-typed. Since there were so few examples of this, it’s hard to say just how beneficial this is.

No language supports all of these features—that’s exciting because it means there’s room to explore! In our paper we go into detail about each of these features and the kinds of tailoring they enable. We also have a chart showing how a handful of languages stack up against each other.

You might have invented type tailoring

We invented the term “type tailoring”, but we didn’t invent the idea—programmers have wanted to teach their type systems how to understand domain-specific concerns for a long time. Here are just a few existing projects we found that were doing type tailoring:

Again, that’s just a small sample. Please see our paper for details on how these projects are using type tailoring, as well as for more examples that we found.

Type tailoring: new terms, new libraries, new horizons

The big contributions of our paper are:

Furthermore, we built two libraries: trivial for Racket—which tailors things like vectors, regular expressions, etc., and Dyn for Rhombus—which turns Rhombus into a gradually-typed language through a tailoring. We expect more will be built in the future.

Again, please see our paper for all the details. Our paper comes with an artifact that contains all the code in the paper. You can simply download a Docker container to run the code and verify all our claims. Yay for reproducible research!

If you have any questions, feel free to email me. (Email in the paper, as well as here on my blog.) If you’re going to ECOOP in Vienna this year, let me know and we can talk in person there!

  1. Like the ACM conference OOPSLA, ECOOP has in recent years focused on more than object-oriented programming. The name stuck around. ¯\_(ツ)_/¯

  2. Rust’s procedural macros operate on token streams unfortunately.

  3. For more on the variable capture problem, see my post about fearless macros.