Topiary aims to be a universal formatter engine within the Tree-sitter ecosystem. Named after the art of clipping or trimming trees into fantastic shapes, it is designed for formatter authors and formatter users:
Authors can create a formatter for a language without having to write their own formatting engine, or even their own parser.
Users benefit from uniform, comparable code style, across multiple languages, with the convenience of a single formatter tool.
The core of Topiary is written in Rust, with declarative formatting rules for bundled languages written in the Tree-sitter query language. In this first release, we have concentrated on formatting OCaml code, capitalising on the OCaml expertise within the Topiary Team and our colleague, Nicolas Jeannerod.
All development and releases happen over in the Topiary GitHub repository.
Coding style has historically been a matter of personal choice. This is
inherently subjective, leading to bikeshedding over formatting choices,
rather than meaningful discussion during review. Prescribed style
guides, linters and ultimately automatic formatters — popularised by
gofmt, whose developers had the insight to
impose “good enough” uniform formatting on a codebase — have helped
solve these issues.
This motivated research into developing a formatter for our Nickel language. However, its internal parser did not provide a syntax tree that retained enough context to allow the original program to be reconstructed after parsing. After creating a Tree-sitter grammar for Nickel, for syntax highlighting, we concluded that it would be possible to leverage Tree-sitter for formatting as well.
But why stop at Nickel? Topiary generalises this approach for any language that doesn’t employ semantic whitespace — for which, specialised formatters, such as our Haskell formatter Ormolu, are required — by expressing formatting style rules in the Tree-sitter query language. It thus aspires to be a “universal formatter engine” for such languages; enabling the fast development of formatters, provided a Tree-sitter grammar is available.
To that end, Topiary has been created with the following goals in mind:
- Use Tree-sitter for parsing, to avoid writing yet another engine for a formatter.
- Expect idempotency. That is, formatting of already-formatted code shouldn’t change anything.
- For bundled formatting styles to meet the following constraints:
- Compatible with attested formatting styles used for that language in the wild.
- Faithful to the author’s intent: if code has been written such that it spans multiple lines, that decision is preserved.
- Minimise changes between commits such that diffs focus mainly on the code that’s changed, rather than superficial artefacts.
- Be well-tested and robust, such that they can be trusted on large projects.
- For end users, the formatter should run efficiently and integrate with other developer tools, such as editors and language servers.
How it Works
As long as a Tree-sitter grammar is defined for a
language, Tree-sitter can parse it and build a concrete syntax tree.
Tree-sitter also allows us to run queries against this tree. We can make
use of these to target interesting subtrees (e.g., an
if block or a
loop), to which we can apply formatting rules. These cohere into a
declarative definition of how that language should be formatted.
( [ (infix_operator) "if" ":" ] @append_space . (_) )
This will match any node that the grammar has identified as an
infix_operator, or the anonymous nodes containing
immediately followed by any named node (represented by the
wildcard pattern). The query matches on subtrees of the same shape,
where the annotated node within it will be “captured” with the name
@append_space; one of many formatting rules we
have defined. Our formatter runs through all matches and captures, and
when we process any capture called
@append_space, we append a space
after the annotated node.
Before rendering the output, Topiary does some post-processing, such as
squashing consecutive spaces and newlines, trimming extraneous
whitespace, and ordering indentation and newline instructions
consistently. This means that you can, for example, prepend and append
true, and Topiary will still output
if true with
just one space between the words.
To make this more concrete, consider the expression
1+2. This has the
following syntax tree, if it’s interpreted as OCaml, where the match
described by the above query is highlighted in red:
@append_space capture instructs Topiary to append a space after
1+ 2. Repeating this process for every
syntactic structure we care about — making judicious generalisations
wherever possible — leads us to an overall formatting style for a
As a formatter author, defining a style for a language is just a matter of building up these queries. End users can then apply them to their codebase with Topiary, to render their code in this style.
Topiary is not the first tool to use Tree-sitter beyond its original scope, nor is it the first tool that attempts to be a formatter for multiple languages (e.g., Prettier). This section contains some tools that we drew inspiration from, or used during the development of Topiary.
- Syntax Tree Playground: An interactive, online playground for experimenting with Tree-sitter and its query language.
- Neovim Treesitter Playground: A Tree-sitter playground plugin for Neovim.
- Difftastic: A tool that utilises Tree-sitter to perform syntactic diffing.
- treefmt: A general formatter orchestrator, which unifies formatters under a common interface.
- format-all: A formatter orchestrator for Emacs.
- null-ls.nvim: An LSP framework for Neovim that facilitates formatter orchestration.
We’re really excited about Topiary and the potential it has in this space.
This first release concentrates on formatting support for OCaml, as well as simple languages, such as JSON and TOML. Experimental formatting support is also available for Nickel, Bash, Rust, and Tree-sitter’s own query language; these are under active development or serve a pedagogical end for formatter authors.
We would highly encourage you to try Topiary and invite you to check out the Topiary GitHub repository to see for yourself. Information on installing and using Topiary can be found in this repository, where we would also welcome contributions, feature requests, and bug reports.