Research

Tweag's Academic Research

Engaging in scholarly research supports Tweag's mission to improve the art of compositional software engineering. We at Tweag learn from the best — and contribute our own new discoveries — by publishing and participating in top-tier venues for computer science research. Sharing knowledge is the way we build a world where software everywhere is more correct, more maintainable, and more performant.

We employ researchers with strong academic backgrounds and ask them to keep pushing the envelope. Their sole focus is to identify future directions for automation and programming languages. Tweagers are working on safety through linear types and dependent types, best practices, application performance and other innovations to improve developer and scientist experience.

TOTAL PAPERS: 20

2024

Destination-passing style programming: a Haskell implementation

JFLA 2024

Thomas Bagrel

Destination-passing style programming introduces destinations, which represent the address of a write-once memory cell. Those destinations can be passed as function parameters, and thus enable the caller of a function to keep control over memory management: the body of the called function will just be responsible of filling that memory cell. This is especially useful in functional programming languages, in which the body of a function is typically responsible for allocation of the result value. Programming with destination in Haskell is an interesting way to improve performance of critical parts of some programs, without sacrificing memory guarantees. Indeed, thanks to a linearly-typed API I present, a write-once memory cell cannot be left uninitialized before being read, and is still disposed of by the garbage collector when it is not in use anymore, eliminating the risk of uninitialized read, memory leak, or double-free errors that can arise when memory is managed manually. In this article, I present an implementation of destinations for Haskell, which relies on so-called compact regions. I demonstrate, in particular, a simple parser example for which the destination-based version uses 35% less memory and time than its naive counterpart for large inputs.

Arxiv

2023

Crème de la Crem: Composable Representable Executable Machines

FUNARCH 2023

Marco Perone, Georgios Karachalias

In this paper we describe how to build software architectures as a composition of state machines, using ideas and principles from the field of Domain-Driven Design. By definition, our approach is modular, allowing one to compose independent subcomponents to create bigger systems, and representable, allowing the implementation of a system to be kept in sync with its graphical representation. In addition to the design itself we introduce the Crem library, which provides a concrete state machine implementation that is both compositional and representable, Crem uses Haskell's advanced type-level features to allow users to specify allowed and forbidden state transitions, and to encode complex state machine -- and therefore domain-specific -- properties. Moreover, since Crem's state machines are representable, Crem can automatically generate graphical representations of systems from their domain implementations.

Link Arxiv

No Unification Variable Left Behind: Fully Grounding Type Inference for the HDM System

ITP 2023

Roger Bosman, Georgios Karachalias, Tom Schrijvers

The Hindley-Damas-Milner (HDM) system provides polymorphism, a key feature of functional programming languages such as Haskell and OCaml. It does so through a type inference algorithm, whose soundness and completeness have been well-studied and proven both manually (on paper) and mechanically (in a proof assistant). Earlier research has focused on the problem of inferring the type of a top-level expression. Yet, in practice, we also may wish to infer the type of subexpressions, either for the sake of elaboration into an explicitly-typed target language, or for reporting those types back to the programmer. One key difference between these two problems is the treatment of underconstrained types: in the former, unification variables that do not affect the overall type need not be instantiated. However, in the latter, instantiating all unification variables is essential, because unification variables are internal to the algorithm and should not leak into the output. We present an algorithm for the HDM system that explicitly tracks the scope of all unification variables. In addition to solving the subexpression type reconstruction problem described above, it can be used as a basis for elaboration algorithms, including those that implement elaboration-based features such as type classes. The algorithm implements input and output contexts, as well as the novel concept of full contexts, which significantly simplifies the state-passing of traditional algorithms. The algorithm has been formalised and proven sound and complete using the Coq proof assistant.

Link

2022

Linearly Qualified Types: Generic inference for capabilities and uniqueness

ICFP 2022

Arnaud Spiwack, Csongor Kiss, Jean-Philippe Bernardy, Nicolas Wu, Richard A. Eisenberg

A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear typing that decreases the bureaucracy of working with linear types. Linear constraints are implicit linear arguments that are filled in automatically by the compiler. We present linear constraints as a qualified type system,together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell.

Link Arxiv

Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow

ICFP 2022

Norman Ramsey

In many compilers, control flow is represented using an arbitrary directed graph. But in some interesting target languages, including JavaScript and WebAssembly, intraprocedural control flow can be expressed only in structured ways, using loops, conditionals, and multilevel breaks or exits. As has long been known, arbitrary control flow can be translated into structured control flow [Peterson, Kasami, and Tokura 1973]. The translation uses two standard analyses, but as published, it takes three passes—which may explain why it was overlooked by Emscripten, a compiler that is widely used to translate C to JavaScript [Zakai 2011]. By tweaking the analyses and by applying fundamental ideas from functional programming (recursive functions and immutable abstract-syntax trees), the translation can be implemented in a single pass—along with a couple of code improvements. This new implementation is slated to be added to the Glasgow Haskell Compiler. Its single-pass translation, its immutable representation, and its use of dominator trees make it much easier to reason about than the original translation.

Link

Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation

conditionally accepted at POPL 2022

Faustyna Krawiec, Neel Krishnaswami, Simon Peyton-Jones, Tom Ellis, Andrew Fitzgibbon, Richard A. Eisenberg

In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.

2021

Efficient Compilation of Algebraic Effect Handlers

OOPSLA 2021

Georgios Karachalias, Filip Koprivec, Matija Pretnar, Tom Schrijvers

The popularity of algebraic effect handlers as a programming language feature for user-defined computational effects is steadily growing. Yet, even though efficient runtime representations have already been studied, most handler-based programs are still much slower than hand-written code. This paper shows that the performance gap can be drastically narrowed (in some cases even closed) by means of type-and-effect directed optimising compilation. Our approach consists of source-to-source transformations in two phases of the compilation pipeline. Firstly, elementary rewrites, aided by judicious function specialisation, exploit the explicit type and effect information of the compiler's core language to aggressively reduce handler applications. Secondly, after erasing the effect information further rewrites in the backend of the compiler emit tight code. This work comes with a practical implementation: an optimising compiler from Eff, an ML style language with algebraic effect handlers, to OCaml. Experimental evaluation with this implementation demonstrates that in a number of benchmarks, our approach eliminates much of the overhead of handlers, outperforms capability-passing style compilation and yields competitive performance compared to hand-written OCaml code as well Multicore OCaml's dedicated runtime support.

Link Zenodo

Union and intersection contracts are hard, actually

Dynamic Language Symposium 2021

Teodoro Freund, Yann Hamdaoui, Arnaud Spiwack

Union and intersection types are a staple of gradually typed language such as TypeScript. While it's long been recognized that union and intersection types are difficult to verify statically, it may appear at first that the dynamic part of gradual typing is actually pretty simple. It turns out however, that in presence of higher-order contracts union and intersection are deceptively difficult. The literature on higher-order contracts with union and intersection, while keenly aware of the fact, doesn't really explain why. We point and illustrate the problems and trade-offs inherent to union and intersection contracts, via example and a survey of the literature.

Arxiv

An Existential Crisis Resolved: Type inference for first-class existential types

ICFP 2021 (Distinguished Paper)

Richard A. Eisenberg, Guillaume Duboc, Stephanie Weirich, Daniel Lee

Despite the great success of inferring and programming with universal types, their dual—existential types—are much harder to work with. Existential types are useful in building abstract types, working with indexed types, and providing first-class support for refinement types. This paper, set in the context of Haskell, presents a bidirectional type-inference algorithm that infers where to introduce and eliminate existentials without any annotations in terms, along with an explicitly typed, type-safe core language usable as a compilation target. This approach is backward compatible. The key ingredient is to use strong existentials, which support (lazily) projecting out the encapsulated data, not weak existentials accessible only by pattern-matching.

Link

Evaluating Linear Functions to Symmetric Monoidal Categories

Haskell Symposium 2021

Jean-Philippe Bernardy, Arnaud Spiwack

A number of domain specific languages, such as circuits or data-science workflows, are best expressed as diagrams of boxes connected by wires. Unfortunately, functional languages have traditionally been ill-equipped to embed this sort of languages. The Arrow abstraction is an approximation, but we argue that it does not capture the right properties. A faithful abstraction is Symmetric Monoidal Categories (SMCs), but,so far,it hasn't been convenient to use. We show how the advent of linear typing in Haskell lets us bridge this gap. We provide a library which lets us program in SMCs with linear functions instead of SMC combinators. This considerably lowers the syntactic overhead of the EDSL to be on par with that of monadic DSLs. A remarkable feature of our library is that, contrary to previously known methods for categories, it does not use any metaprogramming.

Link Arxiv

Seeking Stability by being Lazy and Shallow: Lazy and shallow instantiation is user friendly

Haskell Symposium 2021

Gert-Jan Bottu, Richard A. Eisenberg

Designing a language feature often requires a choice between several, similarly expressive possibilities. Given that user studies are generally impractical, we propose using stability as a way of making such decisions. Stability is a measure of whether the meaning of a program alters under small, seemingly innocuous changes in the code (e.g., inlining). Directly motivated by a need to pin down a feature in GHC/Haskell, we apply this notion of stability to analyse four approaches to the instantiation of polymorphic types, concluding that the most stable approach is lazy (instantiate a polytype only when absolutely necessary) and shallow (instantiate only top-level type variables, not variables that appear after explicit arguments).

Link

A Graded Dependent Type System with a Usage-Aware Semantics

POPL 2021

Pritam Choudhury, Harley Eades III, Richard A. Eisenberg, Stephanie Weirich

Graded Type Theory provides a mechanism to track and reason about resource usage in type systems. In this paper, we develop GraD, a novel version of such a graded dependent type system that includes functions, tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, we develop a heap-based operational semantics and prove a soundness theorem that shows correct accounting of resource usage. Several useful properties, including the standard type soundness theorem, non-interference of irrelevant resources in computation and single pointer property for linear resources, can be derived from this theorem. We hope that our work will provide a base for integrating linearity, irrelevance and dependent types in practical programming languages like Haskell.

Link Arxiv

Partial Type Constructors; Or, Making ad hoc datatypes less ad hoc

POPL 2020

Mark P. Jones, J. Garrett Morris, Richard A. Eisenberg

Functional programming languages assume that type constructors are total. Yet functional programmers know better: counterexamples range from container types that make limiting assumptions about their contents (e.g., requiring computable equality or ordering functions) to type families with defining equations only over certain choices of arguments. We present a language design and formal theory of partial type constructors, capturing the domains of type constructors using qualified types. Our design is both simple and expressive: we support partial datatypes as first-class citizens (including as instances of parametric abstractions, such as the Haskell Functor and Monad classes), and show a simple type elaboration algorithm that avoids placing undue annotation burden on programmers. We show that our type system rejects ill-defined types and can be compiled to a semantic model based on System F. Finally, we have conducted an experimental analysis of a body of Haskell code, using a proof-of-concept implementation of our system; while there are cases where our system requires additional annotations, these cases are rarely encountered in practical Haskell code.

Link

2020

Resolution as Intersection Subtyping via Modus Ponens

OOPSLA 2020

Koar Marntirosian, Tom Schrijvers, Bruno C. d. S. Oliveira, Georgios Karachalias

Resolution and subtyping are two common mechanisms in programming languages. Resolution is used by features such as type classes or Scala-style implicits to synthesize values automatically from contextual type information. Subtyping is commonly used to automatically convert the type of a value into another compatible type. So far the two mechanisms have been considered independently of each other. This paper shows that, with a small extension, subtyping with intersection types can subsume resolution. This has three main consequences. Firstly, resolution does not need to be implemented as a separate mechanism. Secondly, the interaction between resolution and subtyping becomes apparent. Finally, the integration of resolution into subtyping enables first-class (implicit) environments. The extension that recovers the power of resolution via subtyping is the modus ponens rule of propositional logic. While it is easily added to declarative subtyping, significant care needs to be taken to retain desirable properties, such as transitivity and decidability of algorithmic subtyping, and coherence. To materialize these ideas we develop \lambda_i^{MP}, a calculus that extends a previous calculus with disjoint intersection types, and develop its metatheory in the Coq theorem prover.

Link Arxiv Zenodo

Kinds are Calling Conventions

ICFP 2020

Paul Downen, Zena M. Ariola, Simon Peyton Jones, Richard A. Eisenberg

A language supporting polymorphism is a boon to programmers: they can express complex ideas once and reuse functions in a variety of situations. However, polymorphism is pain for compilers tasked with producing efficient code that manipulates concrete values. This paper presents a new intermediate language that allows for efficient static compilation, while still supporting flexible polymorphism. Specifically, it permits polymorphism over not only the types of values, but also the representation of values, the arity of primitive machine functions, and the evaluation order of arguments---all three of which are useful in practice. The key insight is to encode information about a value's calling convention in the kind of its type, rather than in the type itself.

Link

Stitch: The Sound Type-Indexed Type Checker, a Functional Pearl

Haskell Symposium 2020

Richard A. Eisenberg

A classic example of the power of generalized algebraic datatypes (GADTs) to verify a delicate implementation is the type-indexed expression AST. This functional pearl refreshes this example, casting it in modern Haskell using many of GHC's bells and whistles. The Stitch interpreter is a full executable interpreter, with a parser, type checker, common-subexpression elimination, and a REPL. Making heavy use of GADTs and type indices, the Stitch implementation is clean Haskell code and serves as an existence proof that Haskell's type system is advanced enough for the use of fancy types in a practical setting. The paper focuses on guiding the reader through these advanced topics, enabling them to adopt the techniques demonstrated here.

Link

Composing Effects into Tasks and Workflows

Haskell Symposium 2020

Yves Parès, Jean-Philippe Bernardy, Richard A. Eisenberg

Data science applications tend to be built by composing tasks: discrete manipulations of data. These tasks are arranged in directed acyclic graphs, and many frameworks exist within the data science community supporting such a structure, which is called a workflow. In realistic applications, we want to be able to both analyze a workflow in the absence of data, and to execute the workflow with data. This paper combines effect handlers with arrow-like structures to abstract out data science tasks. This combination of techniques enables a modular design of workflows. Additionally, these workflows can both be analyzed prior to running (e.g., to provide early failure) and run conveniently. Our work is directly motivated by real-world scenarios, and we believe that our approach is applicable to new data science and machine learning applications and frameworks.

Link

Kind Inference for Datatypes

POPL 2020 (Distinguished Paper)

Ningning Xie, Richard A. Eisenberg, Bruno C. d. S. Oliveira

In recent years, languages like Haskell have seen a dramatic surge of new features that significantly extends the expressive power of their type systems. With these features, the challenge of kind inference for datatype declarations has presented itself and become a worthy research problem on its own. This paper studies kind inference for datatypes. Inspired by previous research on type-inference, we offer declarative specifications for what datatype declarations should be accepted, both for Haskell98 and for a more advanced system we call PolyKinds, based on the extensions in modern Haskell, including a limited form of dependent types. We believe these formulations to be novel and without precedent, even for Haskell98. These specifications are complemented with implementable algorithmic versions. We study soundness, completeness and the existence of principal kinds in these systems, proving the properties where they hold. This work can serve as a guide both to language designers who wish to formalize their datatype declarations and also to implementors keen to have principled inference of principal types.

Link

2018

Linear Haskell: practical linearity in a higher-order polymorphic language

POPL 2018

Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, Arnaud Spiwack

Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system - both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types - we implemented our type system in GHC, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions.

Link Arxiv

2014

Programming R in Haskell

IFL 2014

Mathieu Boespflug, Allen Brown, Facundo Domínguez, Alexander Vershilov

A standard method for augmenting the “native” set of libraries available within any given programming environment is to extend this set via a foreign function interface provided by the programming language. In this way, by exporting the functionality of external libraries via binding modules, one is able to reuse libraries without having to reimplement them in the language du jour. However, a priori bindings of entire system libraries is a tedious process that quickly creates an unbearable maintenance burden. We demonstrate an alternative to monolithic and imposing binding modules, even to make use of libraries implemented in a special-purpose, dynamically typed, interpreted language. As a case study, we present H, an R-to-Haskell interoperability solution making it possible to program all of R, including all library packages on CRAN, from Haskell, a general-purpose, statically typed, compiled language. We demonstrate how to do so efficiently, without marshalling costs when crossing language boundaries and with static guarantees of well-formation of expressions and safe acquisition of foreign language resources.

Link Homepage