The minimal megaparsec tutorial

24 April 2025 — by Clément Hurlin

In my functional programming course to Master Students of Telecom Nancy, I like to use parsing as an example of monadic programming, relying on the megaparsec library. My only concern with megaparsec is that its official tutorial is long: at the time I’m writing, it’s 15000 words long.

Unlike the official megaparsec tutorial, this blog post is intended to be smaller, and is aimed at an audience with only a basic understanding of Haskell and monadic programming.

All the Haskell material from this blogpost is available on our GitHub: https://github.com/tweag/minimal-megaparsec-tutorial. You can fork this repository to get a full-fledged setup (including CI and Haskell Language Server support) for experimenting with a megaparsec parser 🚀

Running example

My running example is a parser for a domain-specific language that I designed for the class. This language uses primitive drawing commands to represent ASCII art roguelike maps. It looks like this:

HLine 0 0 8; HLine 0 4 8; VLine 0 0 5; VLine 7 0 2; VLine 7 3 2
HLine 8 1 2; HLine 8 3 2
HLine 10 0 6; HLine 10 4 6; VLine 10 0 2; VLine 10 3 2; VLine 15 0 5
Start 2 1
Cell 13 3 ~; Cell 14 3 ~; Cell 14 2 ~

Here, HLine x y len and VLine x y len draw horizontal and vertical walls respectively. The Start x y command marks the player’s starting point and Cell x y ~ places special terrain.

Roguelike maps typically consist of rectangular rooms and connecting corridors, where walls are shown as #, water as ~, and walkable spaces as dots (.) For example, the snippet above draws a map with two connected rooms. The room on the left contains the player’s start location (>), while some water appears in the lower right corner of the room on the right:

########  ######
#.>....####....#
#.............~#
#......####..~~#
########  ######

Walkable floor cells are omitted from the domain-specific language, as they can be inferred by computing the set of cells reachable from the starting point. In implementations of roguelikes, maps like this one are translated into an array of arrays of symbols, with some symbols being walkable (e.g. dot cells and water cells) and some symbols being blockers (walls). The top-level array is then used to compute possible moves and collisions.

The `Parsec` monad

To use megaparsec, we define our main monad type using the Parsec e s a type. It has three arguments:

The type of errors returned by the parser,
the type of stream accepted as input by the parser, and
the type of data returned upon successful parsing of an input stream.

For a simple parser, we define:

The error type to be Text, for simplicity. In a production parser, you would use a structured error type, that distinguishes the different error cases; so that you can handle them differently.
The input stream to be Text, because this is the most idiomatic choice in the Haskell ecosystem:

import Data.Text (Text)
import Text.Megaparsec

type Error = Text
type Input = Text

-- | @Parser a@ is a parser that accepts @Text@ as input and returns an @a@ upon
-- successful parsing.
type Parser a = Parsec Error Input a

Our first parser

Parsers are built from primitive combinators (e.g. lookAhead, notFollowedBy, end of file eof) and combinators derived from them (e.g. oneOf, anySingle, satisfy). These combinators are designed to consume a few symbols, not complex structures (more on this later).

Combinators return parsers in any MonadParsec monad, which means that they have a signature where the head is MonadParsec e s m => ... and the return type is of the form m a ¹. In our context, it suffices to know that m a is instantiated to Parser a, so we can use these combinators for our parsers.

Let’s parse the different kinds of symbols we usually find in ASCII art roguelike maps, using the anySingle function, which parses a single token. In our case, since the input type is Text, the type of tokens is Char (see the ShareInput case of Stream’s documentation, as well as the instances of Stream):

-- | A symbol in the map of an ASCII roguelike
data Symbol
  = -- | A wall, depicted by a # character
    Wall
  | -- | A water cell, depicted by a ~ character
    Water
  deriving (Eq, Show)

-- | A parser for the symbol of a single cell. Used in 'parseElement' below.
parseSymbol :: Parser Symbol
parseSymbol = do
  c <- anySingle
  case c of
    '#' -> return Wall
    '~' -> return Water
    _   -> fail $ "Unknown symbol: " <> [c] -- See below for how to avoid this case altogether (in parseLineElement)

Parser combinators

By virtue of MonadParsecs being monads, parsers can be built using functions that are common in monadic Haskell code (including functions from Functor, Applicative, etc.). Let’s demonstrate this to build a parser for more advanced roguelike map constructs:

data Element
  = -- | Horizontal wall, starting at @(x,y)@ with @length@ cells (ending at @(x+length-1,y)@)
    HorizontalLine Int Int Int
  | -- | Vertical wall, starting at @(x,y)@ with @length@ cells (ending at @(x,y+length-1)@)
    VerticalLine Int Int Int
  | -- | A cell at @(x,y)@ with a symbol
    Cell Int Int Symbol
  | -- | The starting point of the player
    Start Int Int
  deriving (Eq, Show)

The parser for the HorizontalLine and VerticalLine cases can be written as follows:

import Control.Monad (void)
import Control.Monad.Extra (when)
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

parseLineElement :: Parser Element
parseLineElement = do
  constructor <- choice [string "HLine" >> return HorizontalLine, string "VLine" >> return VerticalLine]
  space1 -- One or more space
  x <- decimal
  space1
  y <- decimal
  space1
  len <- decimal
  when (len < 1) $ fail $ "Length must be greater than 0, but got " <> show len
  return $ constructor x y len

The first two lines either parse the string HLine or the string VLine and use the choice function to encode the two possibilities. Also, because each line in a do block encodes a step in the computation, writing monadic parsers is natural: each line consumes some of the input, until enough is consumed to return the desired value. Another example of using a regular monadic function is to use when to stop parsing when an incorrect value is consumed.

Running parsers

Since our parser takes Text as input, it can be tested in a pure context. Megaparsec provides the runParser function for this. To be able to print errors of our parser, our error type must be an instance of ShowErrorComponent; and then we can define a convenient runMyParser function that returns either an error or the parsed value:

import Data.Text (pack, unpack)

-- | Instance required for 'runMyParser'
instance ShowErrorComponent Error where
  showErrorComponent = unpack

-- | A variant of megaparsec's 'runParser', instantiated to our context.
-- Successfully parses an @a@ or returns an error message.
runMyParser :: Parser a -> Input -> Either Text a
runMyParser parser input =
  case runParser parser "" input of
    Left err -> Left $ pack $ errorBundlePretty err
    Right x  -> Right x

Parsing expressions, lists, etc.

Megaparsec not only provides building blocks for parsing tokens and combining parsers. It also provides parsers for common constructs found in programming languages and domain-specific languages, such as expressions and lists. Megaparsec does this by relying on the parser-combinators package.

I don’t want to go into the details of parsing expressions here (e.g. parsing 1 + 2 - 3…), but let me emphasize that it is a bad idea to write your own expression parser. Instead, think about what kind of operators you need and encode them, using the Operator type.

List parsing, on the other hand, is done with various sep… functions. In our case of roguelike maps, we allow different elements to be separated by a semicolon, or by one or more newlines. This is encoded as follows:

parseElements :: Parser [Element]
parseElements = parseElement `sepBy1` separator
  where
    separator = do
      hspace -- Optional horizontal (non-newline) space
      choice [void $ char ';', void $ some eol] -- Either a single ';' or many newlines
      hspace
    parseElement :: Parse Element
    parseElement = choice [parseLineElement, parseStart, parseCell]
      where
        parseStart = do
          void $ string "Start"
          space1
          (x, y) <- parseCoord
          return $ Start x y
        parseCell = do
          void $ string "Cell"
          space1
          (x, y) <- parseCoord
          space1
          symbol <- parseSymbol
          return $ Cell x y symbol
        parseCoord = do
          x <- decimal
          space1
          y <- decimal
          return (x, y)

Conclusion

We’ve presented how to parse simple constructs using megaparsec and how to run our parsers. This blog post is less than 1500 words long: mission accomplished presenting megaparsec in a shorter way than the official tutorial 🥳

If you want to use the code from this blog post as a starting point, feel free to clone https://github.com/tweag/minimal-megaparsec-tutorial. And once your project is moving away from a minimal viable product, head over to megaparsec’s official tutorial to learn about more advanced ways to use megaparsec!

This is an instance of the monad transformer pattern.↩

Behind the scenes

Clément Hurlin

Clément is a Director of Engineering, leading the Build Systems department. He studied Computer Science at Telecom Nancy and received his PhD from Université Nice Sophia Antipolis, where he proved multithreaded programs using linear logic. His technical background includes functional programming, compilers, provers, distributed systems, and build systems.

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Frontend live-coding via ghci Why we blog →