Tweag's Academic Research

Engaging in scholarly research supports Tweag's mission to improve the art of compositional software engineering. We at Tweag learn from the best — and contribute our own new discoveries — by publishing and participating in top-tier venues for computer science research. Sharing knowledge is the way we build a world where software everywhere is more correct, more maintainable, and more performant.

We employ researchers with strong academic backgrounds and ask them to keep pushing the envelope. Their sole focus is to identify future directions for automation and programming languages. Tweagers are working on safety through linear types and dependent types, best practices, application performance and other innovations to improve developer and scientist experience.


Destination-passing style programming: a Haskell implementation
JFLA 2024
Thomas Bagrel
Destination-passing style programming introduces destinations, which represent the address of a write-once memory cell. Those destinations can be passed as function parameters, and thus enable the caller of a function to keep control over memory management: the body of the called function will just be responsible of filling that memory cell. This is especially useful in functional programming languages, in which the body of a function is typically responsible for allocation of the result value. Programming with destination in Haskell is an interesting way to improve performance of critical parts of some programs, without sacrificing memory guarantees. Indeed, thanks to a linearly-typed API I present, a write-once memory cell cannot be left uninitialized before being read, and is still disposed of by the garbage collector when it is not in use anymore, eliminating the risk of uninitialized read, memory leak, or double-free errors that can arise when memory is managed manually. In this article, I present an implementation of destinations for Haskell, which relies on so-called compact regions. I demonstrate, in particular, a simple parser example for which the destination-based version uses 35% less memory and time than its naive counterpart for large inputs.
Crème de la Crem: Composable Representable Executable Machines
Marco Perone, Georgios Karachalias
In this paper we describe how to build software architectures as a composition of state machines, using ideas and principles from the field of Domain-Driven Design. By definition, our approach is modular, allowing one to compose independent subcomponents to create bigger systems, and representable, allowing the implementation of a system to be kept in sync with its graphical representation. In addition to the design itself we introduce the Crem library, which provides a concrete state machine implementation that is both compositional and representable, Crem uses Haskell's advanced type-level features to allow users to specify allowed and forbidden state transitions, and to encode complex state machine -- and therefore domain-specific -- properties. Moreover, since Crem's state machines are representable, Crem can automatically generate graphical representations of systems from their domain implementations.
No Unification Variable Left Behind: Fully Grounding Type Inference for the HDM System
ITP 2023
Roger Bosman, Georgios Karachalias, Tom Schrijvers
The Hindley-Damas-Milner (HDM) system provides polymorphism, a key feature of functional programming languages such as Haskell and OCaml. It does so through a type inference algorithm, whose soundness and completeness have been well-studied and proven both manually (on paper) and mechanically (in a proof assistant). Earlier research has focused on the problem of inferring the type of a top-level expression. Yet, in practice, we also may wish to infer the type of subexpressions, either for the sake of elaboration into an explicitly-typed target language, or for reporting those types back to the programmer. One key difference between these two problems is the treatment of underconstrained types: in the former, unification variables that do not affect the overall type need not be instantiated. However, in the latter, instantiating all unification variables is essential, because unification variables are internal to the algorithm and should not leak into the output. We present an algorithm for the HDM system that explicitly tracks the scope of all unification variables. In addition to solving the subexpression type reconstruction problem described above, it can be used as a basis for elaboration algorithms, including those that implement elaboration-based features such as type classes. The algorithm implements input and output contexts, as well as the novel concept of full contexts, which significantly simplifies the state-passing of traditional algorithms. The algorithm has been formalised and proven sound and complete using the Coq proof assistant.
Linearly Qualified Types: Generic inference for capabilities and uniqueness
ICFP 2022
Arnaud Spiwack, Csongor Kiss, Jean-Philippe Bernardy, Nicolas Wu, Richard A. Eisenberg
A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear typing that decreases the bureaucracy of working with linear types. Linear constraints are implicit linear arguments that are filled in automatically by the compiler. We present linear constraints as a qualified type system,together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell.
Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow
ICFP 2022
Norman Ramsey
In many compilers, control flow is represented using an arbitrary directed graph. But in some interesting target languages, including JavaScript and WebAssembly, intraprocedural control flow can be expressed only in structured ways, using loops, conditionals, and multilevel breaks or exits. As has long been known, arbitrary control flow can be translated into structured control flow [Peterson, Kasami, and Tokura 1973]. The translation uses two standard analyses, but as published, it takes three passes—which may explain why it was overlooked by Emscripten, a compiler that is widely used to translate C to JavaScript [Zakai 2011]. By tweaking the analyses and by applying fundamental ideas from functional programming (recursive functions and immutable abstract-syntax trees), the translation can be implemented in a single pass—along with a couple of code improvements. This new implementation is slated to be added to the Glasgow Haskell Compiler. Its single-pass translation, its immutable representation, and its use of dominator trees make it much easier to reason about than the original translation.
Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation
conditionally accepted at POPL 2022
Faustyna Krawiec, Neel Krishnaswami, Simon Peyton-Jones, Tom Ellis, Andrew Fitzgibbon, Richard A. Eisenberg
In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.
Efficient Compilation of Algebraic Effect Handlers
Georgios Karachalias, Filip Koprivec, Matija Pretnar, Tom Schrijvers
The popularity of algebraic effect handlers as a programming language feature for user-defined computational effects is steadily growing. Yet, even though efficient runtime representations have already been studied, most handler-based programs are still much slower than hand-written code. This paper shows that the performance gap can be drastically narrowed (in some cases even closed) by means of type-and-effect directed optimising compilation. Our approach consists of source-to-source transformations in two phases of the compilation pipeline. Firstly, elementary rewrites, aided by judicious function specialisation, exploit the explicit type and effect information of the compiler's core language to aggressively reduce handler applications. Secondly, after erasing the effect information further rewrites in the backend of the compiler emit tight code. This work comes with a practical implementation: an optimising compiler from Eff, an ML style language with algebraic effect handlers, to OCaml. Experimental evaluation with this implementation demonstrates that in a number of benchmarks, our approach eliminates much of the overhead of handlers, outperforms capability-passing style compilation and yields competitive performance compared to hand-written OCaml code as well Multicore OCaml's dedicated runtime support.
Union and intersection contracts are hard, actually
Dynamic Language Symposium 2021
Teodoro Freund, Yann Hamdaoui, Arnaud Spiwack
Union and intersection types are a staple of gradually typed language such as TypeScript. While it's long been recognized that union and intersection types are difficult to verify statically, it may appear at first that the dynamic part of gradual typing is actually pretty simple. It turns out however, that in presence of higher-order contracts union and intersection are deceptively difficult. The literature on higher-order contracts with union and intersection, while keenly aware of the fact, doesn't really explain why. We point and illustrate the problems and trade-offs inherent to union and intersection contracts, via example and a survey of the literature.
An Existential Crisis Resolved: Type inference for first-class existential types
ICFP 2021 (Distinguished Paper)
Richard A. Eisenberg, Guillaume Duboc, Stephanie Weirich, Daniel Lee
Despite the great success of inferring and programming with universal types, their dual—existential types—are much harder to work with. Existential types are useful in building abstract types, working with indexed types, and providing first-class support for refinement types. This paper, set in the context of Haskell, presents a bidirectional type-inference algorithm that infers where to introduce and eliminate existentials without any annotations in terms, along with an explicitly typed, type-safe core language usable as a compilation target. This approach is backward compatible. The key ingredient is to use strong existentials, which support (lazily) projecting out the encapsulated data, not weak existentials accessible only by pattern-matching.
Evaluating Linear Functions to Symmetric Monoidal Categories
Haskell Symposium 2021
Jean-Philippe Bernardy, Arnaud Spiwack
A number of domain specific languages, such as circuits or data-science workflows, are best expressed as diagrams of boxes connected by wires. Unfortunately, functional languages have traditionally been ill-equipped to embed this sort of languages. The Arrow abstraction is an approximation, but we argue that it does not capture the right properties. A faithful abstraction is Symmetric Monoidal Categories (SMCs), but,so far,it hasn't been convenient to use. We show how the advent of linear typing in Haskell lets us bridge this gap. We provide a library which lets us program in SMCs with linear functions instead of SMC combinators. This considerably lowers the syntactic overhead of the EDSL to be on par with that of monadic DSLs. A remarkable feature of our library is that, contrary to previously known methods for categories, it does not use any metaprogramming.
Seeking Stability by being Lazy and Shallow: Lazy and shallow instantiation is user friendly
Haskell Symposium 2021
Gert-Jan Bottu, Richard A. Eisenberg
Designing a language feature often requires a choice between several, similarly expressive possibilities. Given that user studies are generally impractical, we propose using stability as a way of making such decisions. Stability is a measure of whether the meaning of a program alters under small, seemingly innocuous changes in the code (e.g., inlining). Directly motivated by a need to pin down a feature in GHC/Haskell, we apply this notion of stability to analyse four approaches to the instantiation of polymorphic types, concluding that the most stable approach is lazy (instantiate a polytype only when absolutely necessary) and shallow (instantiate only top-level type variables, not variables that appear after explicit arguments).
A Graded Dependent Type System with a Usage-Aware Semantics
POPL 2021
Pritam Choudhury, Harley Eades III, Richard A. Eisenberg, Stephanie Weirich
Graded Type Theory provides a mechanism to track and reason about resource usage in type systems. In this paper, we develop GraD, a novel version of such a graded dependent type system that includes functions, tensor products, additive sums, and a unit type. Since standard operational semantics is resource-agnostic, we develop a heap-based operational semantics and prove a soundness theorem that shows correct accounting of resource usage. Several useful properties, including the standard type soundness theorem, non-interference of irrelevant resources in computation and single pointer property for linear resources, can be derived from this theorem. We hope that our work will provide a base for integrating linearity, irrelevance and dependent types in practical programming languages like Haskell.
Partial Type Constructors; Or, Making ad hoc datatypes less ad hoc
POPL 2020
Mark P. Jones, J. Garrett Morris, Richard A. Eisenberg
Functional programming languages assume that type constructors are total. Yet functional programmers know better: counterexamples range from container types that make limiting assumptions about their contents (e.g., requiring computable equality or ordering functions) to type families with defining equations only over certain choices of arguments. We present a language design and formal theory of partial type constructors, capturing the domains of type constructors using qualified types. Our design is both simple and expressive: we support partial datatypes as first-class citizens (including as instances of parametric abstractions, such as the Haskell Functor and Monad classes), and show a simple type elaboration algorithm that avoids placing undue annotation burden on programmers. We show that our type system rejects ill-defined types and can be compiled to a semantic model based on System F. Finally, we have conducted an experimental analysis of a body of Haskell code, using a proof-of-concept implementation of our system; while there are cases where our system requires additional annotations, these cases are rarely encountered in practical Haskell code.
Resolution as Intersection Subtyping via Modus Ponens
Koar Marntirosian, Tom Schrijvers, Bruno C. d. S. Oliveira, Georgios Karachalias
Resolution and subtyping are two common mechanisms in programming languages. Resolution is used by features such as type classes or Scala-style implicits to synthesize values automatically from contextual type information. Subtyping is commonly used to automatically convert the type of a value into another compatible type. So far the two mechanisms have been considered independently of each other. This paper shows that, with a small extension, subtyping with intersection types can subsume resolution. This has three main consequences. Firstly, resolution does not need to be implemented as a separate mechanism. Secondly, the interaction between resolution and subtyping becomes apparent. Finally, the integration of resolution into subtyping enables first-class (implicit) environments. The extension that recovers the power of resolution via subtyping is the modus ponens rule of propositional logic. While it is easily added to declarative subtyping, significant care needs to be taken to retain desirable properties, such as transitivity and decidability of algorithmic subtyping, and coherence. To materialize these ideas we develop \lambda_i^{MP}, a calculus that extends a previous calculus with disjoint intersection types, and develop its metatheory in the Coq theorem prover.
Kinds are Calling Conventions
ICFP 2020
Paul Downen, Zena M. Ariola, Simon Peyton Jones, Richard A. Eisenberg
A language supporting polymorphism is a boon to programmers: they can express complex ideas once and reuse functions in a variety of situations. However, polymorphism is pain for compilers tasked with producing efficient code that manipulates concrete values. This paper presents a new intermediate language that allows for efficient static compilation, while still supporting flexible polymorphism. Specifically, it permits polymorphism over not only the types of values, but also the representation of values, the arity of primitive machine functions, and the evaluation order of arguments---all three of which are useful in practice. The key insight is to encode information about a value's calling convention in the kind of its type, rather than in the type itself.
Stitch: The Sound Type-Indexed Type Checker, a Functional Pearl
Haskell Symposium 2020
Richard A. Eisenberg
A classic example of the power of generalized algebraic datatypes (GADTs) to verify a delicate implementation is the type-indexed expression AST. This functional pearl refreshes this example, casting it in modern Haskell using many of GHC's bells and whistles. The Stitch interpreter is a full executable interpreter, with a parser, type checker, common-subexpression elimination, and a REPL. Making heavy use of GADTs and type indices, the Stitch implementation is clean Haskell code and serves as an existence proof that Haskell's type system is advanced enough for the use of fancy types in a practical setting. The paper focuses on guiding the reader through these advanced topics, enabling them to adopt the techniques demonstrated here.
Composing Effects into Tasks and Workflows
Haskell Symposium 2020
Yves Parès, Jean-Philippe Bernardy, Richard A. Eisenberg
Data science applications tend to be built by composing tasks: discrete manipulations of data. These tasks are arranged in directed acyclic graphs, and many frameworks exist within the data science community supporting such a structure, which is called a workflow. In realistic applications, we want to be able to both analyze a workflow in the absence of data, and to execute the workflow with data. This paper combines effect handlers with arrow-like structures to abstract out data science tasks. This combination of techniques enables a modular design of workflows. Additionally, these workflows can both be analyzed prior to running (e.g., to provide early failure) and run conveniently. Our work is directly motivated by real-world scenarios, and we believe that our approach is applicable to new data science and machine learning applications and frameworks.
Kind Inference for Datatypes
POPL 2020 (Distinguished Paper)
Ningning Xie, Richard A. Eisenberg, Bruno C. d. S. Oliveira
In recent years, languages like Haskell have seen a dramatic surge of new features that significantly extends the expressive power of their type systems. With these features, the challenge of kind inference for datatype declarations has presented itself and become a worthy research problem on its own. This paper studies kind inference for datatypes. Inspired by previous research on type-inference, we offer declarative specifications for what datatype declarations should be accepted, both for Haskell98 and for a more advanced system we call PolyKinds, based on the extensions in modern Haskell, including a limited form of dependent types. We believe these formulations to be novel and without precedent, even for Haskell98. These specifications are complemented with implementable algorithmic versions. We study soundness, completeness and the existence of principal kinds in these systems, proving the properties where they hold. This work can serve as a guide both to language designers who wish to formalize their datatype declarations and also to implementors keen to have principled inference of principal types.
Linear Haskell: practical linearity in a higher-order polymorphic language
POPL 2018
Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, Arnaud Spiwack
Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values. To demonstrate the efficacy of our linear type system - both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types - we implemented our type system in GHC, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions.
Programming R in Haskell
IFL 2014
Mathieu Boespflug, Allen Brown, Facundo Domínguez, Alexander Vershilov
A standard method for augmenting the “native” set of libraries available within any given programming environment is to extend this set via a foreign function interface provided by the programming language. In this way, by exporting the functionality of external libraries via binding modules, one is able to reuse libraries without having to reimplement them in the language du jour. However, a priori bindings of entire system libraries is a tedious process that quickly creates an unbearable maintenance burden. We demonstrate an alternative to monolithic and imposing binding modules, even to make use of libraries implemented in a special-purpose, dynamically typed, interpreted language. As a case study, we present H, an R-to-Haskell interoperability solution making it possible to program all of R, including all library packages on CRAN, from Haskell, a general-purpose, statically typed, compiled language. We demonstrate how to do so efficiently, without marshalling costs when crossing language boundaries and with static guarantees of well-formation of expressions and safe acquisition of foreign language resources.


AboutOpen SourceCareersContact Us

Connect with us

© 2024 Modus Create, LLC

Privacy PolicySitemap