Ormolu: Format Haskell code like never before

27 May 2019 — by Mark Karpov

If we think of the average Haskeller, who do we see? A poor, broken person. Tired head falls into the open palms, sobbing, deprived of the basic human right for automated source code formatting.

Is it at all conceivable that so many attempts were made and none quite succeeded? The design space is surprisingly large. Perhaps the sweet spot for large projects with several contributors hasn’t been found yet.

I’d like to announce a new project called Ormolu. It’s still vaporware, but that’s just a bug we’re a long way into fixing, and I want to convince you that the principles are sound.

Principles

Ormolu is a formatter that follows a few simple ideas that make it quite unlike other similar projects in the Haskell land. I’m going to explain them now.

What are code formatters good for? Normalizing what does not impact readability and therefore need not be under the programmer’s control. But use of whitespace does impact readability, and therefore should be under at least partial control of the programmer. In other words, there is no gain in tolerating 5 different type signature styles, when there is gain in letting the programmer decide whether some if-then-else should be single line or multiline.

In Ormolu, the layout of the input influences the layout of the output. This means that the choices between single line/multiline layouts in each particular situation are made by the author of the original source code, not by an algorithm. While giving more precise control to the user, as a bonus we also get a simpler and faster implementation.

Both Hindent and Brittany try to make their own decisions about use of whitespace. While Hindent’s decisions are simpler than Brittany’s, great care needs to be taken in the implementation to avoid exponential blowups when formatting deeply nested expressions. On the other hand Brittany tries hard to avoid those, at the cost of more complex data structures.

Code formatters are also good to take away some of the tedium of writing code in the first place. With Ormolu, if you decide that a particular case-expression or type signature should be multiline, you don’t have to painstakingly write out each line properly indented. Just introduce a line break anywhere at least once and Ormolu will do the rest.

Let’s see an example of Ormolu’s approach. The input:

-- | Foo performs foo and sometimes bar.

foo :: Thoroughness
  -> Int -> Int
foo t x = if x > 20
    then case t of
           Thorough -> x + 50
           Somewhat -> x + 20
           NotAtAll -> 0
    else 10 + 1

Results in the following formatted code:

-- | Foo performs foo and sometimes bar.
foo
  :: Thoroughness
  -> Int
  -> Int
foo t x =
  if x > 20
  then
    case t of
      Thorough -> x + 50
      Somewhat -> x + 20
      NotAtAll -> 0
  else 10 + 1

The fact that the signature of foo occupies two lines in the original source code causes the multiline version of the type signature to be used in the formatted version. The same principle applies to the body of foo. Note the difference between formatting of then and else clauses: then is multiline and else is single line.

Other features that are worth mentioning:

Idempotency: formatting already formatted code is a no-op. This is an important property for any code formatter to have, which still holds under our multiline-in-multiline-out policy, even if the formatting for a given parse tree is not unique.
The project aims to implement one “true” formatting style which admits no configuration. Similarly to what’s described in the blog post about Hindent 5, we concluded that if formatting is done automatically, it’s better to embrace one style and avoid stylistic fragmentation. This way everyone who uses Ormolu will be automatically on the same page.
The formatting style aims to result in minimal diffs while still remaining close to conventional Haskell formatting. Certain formatting practices, like vertically aligning the bodies of let-bindings or allowing the length of a type or variable name to influence indentation level lead to diff amplification. Therefore, we try to avoid that.

Why Ormolu?

There are a few solutions for formatting Haskell source code, why would this project be more successful?

Ormolu uses GHC’s own parser to avoid parsing problems caused by haskell-src-exts. Many similar projects suffer from the fact that they don’t use the same parser that GHC does. Like Brittany, we are using the parser from the ghc package and therefore work with the same AST that GHC uses.
The code of the formatter is written so that it’s easy to modify and maintain. Roughly, it means that the project follows the path of Hindent and is very much about printing the AST in a particular way. So far I think the goal is met and the code base is hacking-friendly.
There is a good testing scheme in place that allows us to grow the collection of examples easily. This will keep the project well-tested and robust to the point that it can be used in large projects without exposing unfortunate, disappointing bugs here and there.
It is an open project that anyone is free to fork and it is actively maintained by a large commercial contributor, that is, Tweag. This makes the odds very high that it’ll be maintained in the future and bugs will be fixed.

Want it sooner? You can help!

Right now some parts of the AST are implemented fully, such as data type definitions, module export lists, and a few others. Most importantly, handling of comments is dealt with implicitly by the rendering combinators, allowing us to focus on rendering the AST nodes only.

But the GHC AST is huge. This is why contributions are welcome. The printing framework and the approach to testing that we use makes it very easy to implement the rendering of the missing parts of AST. So right now it’s just a matter of time before we have a fully featured formatter for Haskell code.

Let’s do it iteratively: spend an evening and implement rendering of a little bit of Haskell syntax, throw in a few files in the test suite and boom, we’re a bit closer. One can pick up something really simple, such as e.g. role annotations. It takes 1 hour or so, but a whole new type of declarations will be supported! Lots of fun.

What is more, we’re taking Ormolu to ZuriHac, where everyone will be able to help developing the project. So come and contribute to a tool that you’ll be able to use proudly at your daily job. Or indeed, in the intimacy of a late evening. As a little secret. Just between you and your source code.

About the author

Mark Karpov

Mark is a build system expert with a particular focus on Bazel. As a consultant at Tweag he has worked with a number of large and well-known companies that use Bazel or decided to migrate to it. Other than build systems, Mark's background is in functional programming and in particular Haskell. His personal projects include high-profile Haskell libraries, tutorials, and a technical blog.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← inline-js: seamless JavaScript/Haskell interop CPP considered harmful →