If you’re anxious about the size of your binary, there’s a lot of useful advice on the internet to help you reduce it. In my experience, though, people are reticent to discuss their static libraries. If they’re mentioned at all, you’ll be told not to worry about their size: dead code will be optimized away when linking the final binary, and the final binary size is what matters.
But that advice didn’t help me, because I wanted to distribute a static library
and the size was causing me problems. Specifically, I had a Rust library1
that I wanted to make available to Go developers. Both Rust and Go can interoperate
with C, so I compiled the Rust code into a C-compatible library and made a little
Go wrapper package for it.
Like most pre-compiled C libraries, I can distribute it either as a static
or a dynamic library. Now Go developers are accustomed to static linking, which
produces self-contained binaries that are refreshingly easy to deploy. Bundling
a pre-compiled static library with our Go package allows Go developers to just
go get https://github.com/nickel-lang/go-nickel and get to work. Dynamic
libraries, on the other hand, require runtime dependencies, linker paths, and installation instructions.
So I really wanted to go the static route, even if it came with a slight size penalty. How large of a penalty are we talking about, anyway?
❯ ls -sh target/release/
132M libnickel_lang.a
15M libnickel_lang.so😳 Ok, that’s too much. Even if I were morally satisfied with 132MB of library, it’s way beyond GitHub’s 50MB file size limit.2 (Honestly, even the 15M shared library seems large to me; we haven’t put much effort into optimizing code size yet.)
The compilation process in a nutshell
Back in the day, your compiler or assembler would turn each source file into an “object” file containing the compiled code. In order to allow for source files to call functions defined in other source files, each object file could announce the list of functions3 that it defines, and the list of functions that it very much hopes someone else will define. Then you’d run the linker, a program that takes all those object files and mashes them together into a binary, matching up the hoped-for functions with actual function definitions or yelling “undefined symbol” if it can’t. Modern compiled languages tweak this pipeline a little: Rust produces an object file per crate4 instead of one per source file. But the basics haven’t changed much.
A static library is nothing but a bundle of object files, wrapped in an ancient and never-quite-standardized archive format. No linker is involved in the creation of a static library: it will be used eventually to link the static library into a binary. The unfortunate consequence is that a static library contains a lot of information that we don’t want. For a start, it contains all the code of all our dependencies even if much of that code is unused. If you compiled your code with support for link-time optimization (LTO), it contains another copy (in the form of LLVM bitcode — more on that later) of all our code and the code of all our dependencies. And then because it has so much redundant code, it contains a bunch of metadata (section headers) to make it easier for the linker to remove that redundant code later. The underlying reason for all this is that extra fluff in object files isn’t usually considered a problem: it’s removed when linking the final binary (or shared library), and that’s all that most people care about.
Re-linking with ld
I wrote above that a linker takes a bunch of object files and mashes
them together into a binary. Like everything in the previous section,
this was an oversimplification: if you pass the --relocatable flag
to your linker, it will mash your object files together but write out
the result as an object file instead of a binary.
If you also pass the --gc-sections flag, it will remove
unused code while doing so.
This gives us a first strategy for shrinking a static archive:
- unpack the archive, retrieving all the object files
- link them all together into a single large object, removing unused code. In this step we need to tell the linker which code is used, and then it will remove anything that can’t be reached from the used code.
- pack that single object back into a static library
# Unpack the archive
ar x libnickel_lang.a
# Link all the objects together, keeping on the parts reachable from our
# public API (about 50 functions worth)
ld --relocatable --gc-sections -o merged.o *.o -u nickel_context_alloc -u nickel_context_free ...
# Pack it back up
ar rcs libsmaller_nickel_lang.a merged.oThis helps a bit: the archive size went from 132MB to 107MB. But there’s clearly still room for improvement.
Examining our merged object file with the size command, the largest
section by far — weighing in at 84MB — is .llvmbc. Remember I wrote
that we’d come back the LLVM bitcode? Well, when you compile something with
LLVM (and the Rust compiler uses LLVM), it converts the original source
code into an intermediate representation, then it converts the
intermediate representation into machine code, and then5 it writes both
the intermediate representation and the machine code into an object file.
It keeps the intermediate representation around in case it has useful
information for further optimization during linking time. Even if that
information is useful, it isn’t 84MB useful.6 Out it goes:
objcopy --remove-section .llvmbc merged.o without_llvmbc.oThe next biggest sections contain debug information. Those might be useful, but we’ll remove them for now just to see how small we can get.
strip --strip-unneeded without_llvmbc.o -o stripped.oAt this point there aren’t any giant sections left. But there are more
than 48,000 small sections. It turns out that the Rust compiler puts
every single tiny function into its own little section within the object
file. It does this to help the linker remove unused code: remember the
--gc-sections argument to ld? It removes unused sections, and so if the
sections are small then unused code can be removed precisely. But
we’ve already removed unused code, and each of those 48,000 section
headers is taking up space.
To do this, we write a linker script that tells ld to merge sections together.
The meaning of the various sections isn’t important here: the point is that
we’re merging sections with names like .text._ZN11nickel_lang4Expr7to_json17h
and .text._ZN11nickel_lang4Expr7to_yaml17h into a single big .text section.
/* merge.ld */
SECTIONS
{
.text :
{
*(.text .text.*)
}
.rodata :
{
*(.rodata .rodata.*)
}
/* and a couple more */
}And we use it like this:
ld --relocatable --script merge.ld stripped.o -o without_tiny_sections.oLet’s take a look back at what we did to our archive, and how much it helped:
| Size | |
|---|---|
| original | 132MB |
linked with --gc-sections |
107MB |
removed .llvmbc |
33MB |
| stripped | 25MB |
| merged sections | 19MB |
It’s probably possible to continue, but this is already a big improvement. We got rid of more than 85% of our original size!
We did lose something in the last two steps, though. Stripping the debug information might make backtraces less useful, and merging the sections removes the ability for future linking steps to remove unused code from the final binaries. In our case, our library has a relatively small and coarse API; I checked that as soon as you use any non-trivial function, less than 150KB of dead code remains. But you’ll need to decide for yourself whether these costs are worth the size reduction.
More portability with LLVM bitcode
I was reasonably pleased with the outcome of the previous section until I tried to port
it to MacOS, because it turns out that the MacOS linker doesn’t support
--gc-sections (it has a -dead_strip option, but it’s incompatible with --relocatable
because apparently no one cares about code size unless they’re building a binary).
After drafting this post but before publishing it, I found
this
nice post on shrinking MacOS static libraries using the toolchain from XCode.
I’m no MacOS expert so I’m probably using it wrong, but I only got down to
about 25MB (after stripping) using those tools. (If you know how to do better, let me know!)
But there is another way! Remember that we had two copies of all our code: the LLVM intermediate representation and the machine code.7 Last time, we chucked out the intermediate representation and used the machine code. But since I don’t know how to massage the machine code on MacOS, we can work with the intermediate representation instead.
The first step is to extract the LLVM bitcode and throw out the rest.
(The section name on MacOS is __LLVM,__bitcode instead of .llvmbc like it was on Linux.)
for obj_file in ./*.o; do
llvm-objcopy --dump-section=__LLVM,__bitcode="$obj_file.bc" "$obj_file"
doneThen we combine all the little bitcode files into one gigantic one:
llvm-link -o merged.bc ./*.bcAnd we remove the unused code by telling LLVM which functions make up the public API. We ask it to “internalize” every function that isn’t in the list, and to remove code that isn’t reachable from a public function (the “dce” in “globaldce” stands for “dead-code elimination”).
opt \
--internalize-public-api-list=nickel_context_alloc,... \
--passes='internalize,globaldce' \
-o small.bc \
merged.bcFinally, we recompile the result back into an object file and pop
it into a static library. llc turns the LLVM bitcode back into
machine code, so the resulting object file can be consumed by
non-LLVM toolchains.
llc --filetype=obj --relocation-model=pic small.bc -o small.o
ar rcs libsmaller_nickel_lang.a small.oThe result is a 19MB static library, pretty much the same as the other workflow.
Note that we don’t need the section-merging step here, because we
didn’t ask llc to generate a section per function.
Dragonfire
Soon after drafting this post, I learned about dragonfire, a recently-released and awesomely-named tool for shrinking collections of static libs by pulling out and deduplicating object files. I don’t think this post’s techniques can be combined with theirs for extra savings, because you can’t both deduplicate and merge object files (I guess in principle you could deduplicate some and merge others, if you have very specific needs.) But it’s a great read, and I was gratified to discover that someone else shared my giant-Rust-static-library concerns.
Conclusion
We saw two ways to significantly reduce the size of a static library, one using
classic tools like ld and objcopy and another using LLVM-specific tools.
They both produced similar-sized outputs, but as with everything in life
there are some tradeoffs. The “classic” bintools approach works with both GNU bintools
and LLVM bintools, and it’s significantly faster — a few seconds, compared
to a minute or so — than the LLVM tools,
which need to recompile everything from the intermediate representation to
machine code. Moreover, the bintools approach should work with any static library,
not just one compiled with a LLVM-based toolchain.
On the other hand, the LLVM approach works on MacOS (and Linux, Windows, and probably others). For this reason alone, this is the way we’ll be building our static libraries for Nickel.
- Namely, the library API for Nickel, which is going to have a stable embedding API real soon now, including bindings for C and Go!↩
- Go expects packages with pre-compiled dependencies to check the compiled code directly into a git repository.↩
- technically “symbols”, not “functions”. But for this abbreviated discussion, the distinction doesn’t matter.↩
- Or not. To improve parallelization, Rust sometimes generates multiple object files per crate.↩
- if you’ve turned on link-time optimization↩
- Linux distributions that use LTO seem to agree that this intermediate representation should be stripped before distributing the library.↩
- We have the LLVM intermediate representation because we build with LTO.
If you aren’t using LTO then there are probably other ways to get it, like with
Rust’s
--emit-llvm-irflag.↩
Behind the scenes
Joe is a programmer and a mathematician. He loves getting lost in new topics and is a fervent believer in clear documentation.
If you enjoyed this article, you might be interested in joining the Tweag team.