CPP considered harmful

27 June 2019 — by Mathieu Boespflug

Edsger Dijkstra took issue with the “unbridled” use of the goto programming construct in 1968. He noted that our already limited ability to reason about the dynamic behaviour of imperative programs was nigh impossible in the presence of goto-jumps to arbitrary labels. With goto, control flow is permitted to be entirely disconnected from the structure of the code as written, so it becomes very hard to guess at the value of variables without actually running the program. Just like goto for dynamic behaviour, the unbridled use of the C preprocessor (CPP) to conditionally compile code is hampering our ability to analyse and manipulate code. In fact the commonality in the arguments made it difficult to resist the temptation to reuse the already overused title of Dijkstra’s paper. In this post, I want to rehash the argument that CPP should be dispensed with because it makes bad code too tempting to write, like Dijkstra did for goto.

The idea that unrestricted conditional compilation should be avoided is old news. While it is extremely common in programming languages of the 70’s (like C), it is nonexistent in popular programming languages of the 00’s (like Rust or Go). Haskell, a language born in the late 80’s, punted on difficult problems like multi-platform support and handling breaking changes in upstream dependencies. Using CPP to deal with these issues was a quick fix, and a historical accident that somehow survived to this day.

Say I’m writing innocent enough code:

module Main where

main = do
  name <- getLine
  putStrLn $ "Hello " <> name

This works fine with the latest GHC. But using GHC circa 2016, this won’t compile, because (<>) is not part of the Prelude. At this point I have four choices:

Decide that I don’t care about old GHC (or in general any old version of a package dependency) and move on.
Use (++) instead of (<>) (they happen to have the same meaning in this particular case), or otherwise rewrite my code, e.g. by adding import Data.Semigroup ((<>)).
Write two versions of my module: one that works for newer GHC, and one that works for older GHC, and letting the build system decide which one to pick.
Use conditional compilation.

This last solution might look like this:

module Main where

main = do
  name <- getLine
#if MIN_VERSION_base(X, X, X)
  putStrLn $ "Hello " <> name
#else
  putStrLn $ "Hello " ++ name

Clearly Option 1 or Option 2 would work out better than this already unreadable mess, which might only get worse when other incompatibilities arise, requiring nested conditional compilation. Some might argue that Option 1 (dropping support) isn’t used nearly often enough. I agree, all the more so given the success and broad adoption of Stackage, but it’s a debate for another day. Option 2 is about resisting the temptation to use new functions not previously available. But how do we deal with breaking changes in dependencies? In such a case only Option 3 and Option 4 are available.

In version 2.1, the singletons library exposed two datatype definitions:

data Proxy t :: * -> *
data KProxy t :: * -> *

From version 2.2 onwards, only one, more general datatype is exposed:

data Proxy k (t :: k) :: * -> *

In user code, KProxy now needs to be replaced everywhere with Proxy. Unlike in our previous example, Option 2 is not available: there is no way to change the code in such a way that it compiles with both singletons-2.1 and singletons-2.2. Option 3 doesn’t look terribly appealing at first blush because Don’t Repeat Yourself (DRY).

The temptation is high to introduce conditional compilation everywhere. The common way to do so is with CPP at each use site:

#if MIN_VERSION_singletons(2,2,0)
  ... KProxy ...
#else
  ... Proxy ...
#endif

The problem is that with conditional compilation using CPP, we lose a great deal. It is no longer possible to do any syntactic analysis of your modules, say to lint check or to apply code formatters. The source is no longer syntactically valid Haskell: it needs a preprocessor to defang it first. Which would be fine, except that if you want to analyze every branch of every #if and #ifdef, you need to run the preprocessor with every combination of every predicate (true/false or defined/undefined for every macro), leading to an exponential blow up in the number of times you need to run HLint, and other automatic tools like code formatters thrown out the window entirely.

What if we taught these tools CPP syntax, to obviate having to evaluate each branch of the preprocessor? Like unrestricted goto allowing labels pretty much anywhere, CPP conditionals can appear anywhere at all: module headers, import declarations, in the middle of an expression, or indeed arbitrary different combinations of these in each branch of a conditional. Each branch need not be syntactically correct, with balanced parentheses and well-scoped names. Parsing becomes a very complicated problem.

The vexing issue is that we seldom need CPP for conditional compilation in the first place, if at all. There is no silver bullet for avoiding CPP. Giving up goto means turning to structured control flow constructs (e.g. while-loops or try/catch), neither of which completely replacing goto, but together covering most use cases for goto. Giving up CPP means turning to any of the following strategies to achieve broader compatibility:

Push all configuration to the build system: if you’re writing a cross-platform network library, put all Win32 code in separate files from the Linux code. Let the build system choose what modules to build depending on the target platform. No CPP required.
Designing for extensibility: the network library has a datatype of socket address domains. Since not all platforms support all domains, this forces conditional compilation in socket address primitives. By contrast, the socket library has an open type family of domains. Support for each domain can be kept in a dedicated source file, as above.
Abstract away compatibility concerns: if you really need to target multiple versions of a dependency, create a small module that abstracts away the differences and depend on that.
Use structured conditional compilation: if you really have to use conditional compilation within a source file, prefer structured conditional compilation. Template Haskell can conditionally define a function. Unlike CPP, using Template Haskell is still syntactically correct Haskell.

We should challenge the idea that CPP is unavoidable. After all, multi-platform support and backwards compatibility are universal concerns for all programming languages. Unstructured conditional compilation is highly unusual in many of these, even for multi-platform code. Like some of the pioneers of software did with the goto of old, we overestimate the need for the power of non-structure, while forgetting about the benefits of structure. Foregoing CPP entirely should rid us of our illusions.

Behind the scenes

Mathieu Boespflug

Mathieu is the CEO and founder of Tweag.

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Ormolu: Format Haskell code like never before Revelations from repetition: Source code headers in Haskell and Python →