Shrinking while linking

27 November 2025 — by Joe Neeman

If you’re anxious about the size of your binary, there’s a lot of useful advice on the internet to help you reduce it. In my experience, though, people are reticent to discuss their static libraries. If they’re mentioned at all, you’ll be told not to worry about their size: dead code will be optimized away when linking the final binary, and the final binary size is what matters.

But that advice didn’t help me, because I wanted to distribute a static library and the size was causing me problems. Specifically, I had a Rust library¹ that I wanted to make available to Go developers. Both Rust and Go can interoperate with C, so I compiled the Rust code into a C-compatible library and made a little Go wrapper package for it. Like most pre-compiled C libraries, I can distribute it either as a static or a dynamic library. Now Go developers are accustomed to static linking, which produces self-contained binaries that are refreshingly easy to deploy. Bundling a pre-compiled static library with our Go package allows Go developers to just go get https://github.com/nickel-lang/go-nickel and get to work. Dynamic libraries, on the other hand, require runtime dependencies, linker paths, and installation instructions.

So I really wanted to go the static route, even if it came with a slight size penalty. How large of a penalty are we talking about, anyway?

❯ ls -sh target/release/
132M libnickel_lang.a
15M  libnickel_lang.so

😳 Ok, that’s too much. Even if I were morally satisfied with 132MB of library, it’s way beyond GitHub’s 50MB file size limit.² (Honestly, even the 15M shared library seems large to me; we haven’t put much effort into optimizing code size yet.)

The compilation process in a nutshell

Back in the day, your compiler or assembler would turn each source file into an “object” file containing the compiled code. In order to allow for source files to call functions defined in other source files, each object file could announce the list of functions³ that it defines, and the list of functions that it very much hopes someone else will define. Then you’d run the linker, a program that takes all those object files and mashes them together into a binary, matching up the hoped-for functions with actual function definitions or yelling “undefined symbol” if it can’t. Modern compiled languages tweak this pipeline a little: Rust produces an object file per crate⁴ instead of one per source file. But the basics haven’t changed much.

A static library is nothing but a bundle of object files, wrapped in an ancient and never-quite-standardized archive format. No linker is involved in the creation of a static library: it will be used eventually to link the static library into a binary. The unfortunate consequence is that a static library contains a lot of information that we don’t want. For a start, it contains all the code of all our dependencies even if much of that code is unused. If you compiled your code with support for link-time optimization (LTO), it contains another copy (in the form of LLVM bitcode — more on that later) of all our code and the code of all our dependencies. And then because it has so much redundant code, it contains a bunch of metadata (section headers) to make it easier for the linker to remove that redundant code later. The underlying reason for all this is that extra fluff in object files isn’t usually considered a problem: it’s removed when linking the final binary (or shared library), and that’s all that most people care about.

Re-linking with `ld`

I wrote above that a linker takes a bunch of object files and mashes them together into a binary. Like everything in the previous section, this was an oversimplification: if you pass the --relocatable flag to your linker, it will mash your object files together but write out the result as an object file instead of a binary. If you also pass the --gc-sections flag, it will remove unused code while doing so.

This gives us a first strategy for shrinking a static archive:

unpack the archive, retrieving all the object files
link them all together into a single large object, removing unused code. In this step we need to tell the linker which code is used, and then it will remove anything that can’t be reached from the used code.
pack that single object back into a static library

# Unpack the archive
ar x libnickel_lang.a

# Link all the objects together, keeping on the parts reachable from our
# public API (about 50 functions worth)
ld --relocatable --gc-sections -o merged.o *.o -u nickel_context_alloc -u nickel_context_free ...

# Pack it back up
ar rcs libsmaller_nickel_lang.a merged.o

This helps a bit: the archive size went from 132MB to 107MB. But there’s clearly still room for improvement.

Examining our merged object file with the size command, the largest section by far — weighing in at 84MB — is .llvmbc. Remember I wrote that we’d come back the LLVM bitcode? Well, when you compile something with LLVM (and the Rust compiler uses LLVM), it converts the original source code into an intermediate representation, then it converts the intermediate representation into machine code, and then⁵ it writes both the intermediate representation and the machine code into an object file. It keeps the intermediate representation around in case it has useful information for further optimization during linking time. Even if that information is useful, it isn’t 84MB useful.⁶ Out it goes:

objcopy --remove-section .llvmbc merged.o without_llvmbc.o

The next biggest sections contain debug information. Those might be useful, but we’ll remove them for now just to see how small we can get.

strip --strip-unneeded without_llvmbc.o -o stripped.o

At this point there aren’t any giant sections left. But there are more than 48,000 small sections. It turns out that the Rust compiler puts every single tiny function into its own little section within the object file. It does this to help the linker remove unused code: remember the --gc-sections argument to ld? It removes unused sections, and so if the sections are small then unused code can be removed precisely. But we’ve already removed unused code, and each of those 48,000 section headers is taking up space.

To do this, we write a linker script that tells ld to merge sections together. The meaning of the various sections isn’t important here: the point is that we’re merging sections with names like .text._ZN11nickel_lang4Expr7to_json17h and .text._ZN11nickel_lang4Expr7to_yaml17h into a single big .text section.

/* merge.ld */
SECTIONS
{
  .text :
  {
    *(.text .text.*)
  }

  .rodata :
  {
    *(.rodata .rodata.*)
  }

  /* and a couple more */
}

And we use it like this:

ld --relocatable --script merge.ld stripped.o -o without_tiny_sections.o

Let’s take a look back at what we did to our archive, and how much it helped:

	Size
original	132MB
linked with `--gc-sections`	107MB
removed `.llvmbc`	33MB
stripped	25MB
merged sections	19MB

It’s probably possible to continue, but this is already a big improvement. We got rid of more than 85% of our original size!

We did lose something in the last two steps, though. Stripping the debug information might make backtraces less useful, and merging the sections removes the ability for future linking steps to remove unused code from the final binaries. In our case, our library has a relatively small and coarse API; I checked that as soon as you use any non-trivial function, less than 150KB of dead code remains. But you’ll need to decide for yourself whether these costs are worth the size reduction.

More portability with LLVM bitcode

I was reasonably pleased with the outcome of the previous section until I tried to port it to MacOS, because it turns out that the MacOS linker doesn’t support --gc-sections (it has a -dead_strip option, but it’s incompatible with --relocatable because apparently no one cares about code size unless they’re building a binary). After drafting this post but before publishing it, I found this nice post on shrinking MacOS static libraries using the toolchain from XCode. I’m no MacOS expert so I’m probably using it wrong, but I only got down to about 25MB (after stripping) using those tools. (If you know how to do better, let me know!)

But there is another way! Remember that we had two copies of all our code: the LLVM intermediate representation and the machine code.⁷ Last time, we chucked out the intermediate representation and used the machine code. But since I don’t know how to massage the machine code on MacOS, we can work with the intermediate representation instead.

The first step is to extract the LLVM bitcode and throw out the rest. (The section name on MacOS is __LLVM,__bitcode instead of .llvmbc like it was on Linux.)

for obj_file in ./*.o; do
  llvm-objcopy --dump-section=__LLVM,__bitcode="$obj_file.bc" "$obj_file"
done

Then we combine all the little bitcode files into one gigantic one:

llvm-link -o merged.bc ./*.bc

And we remove the unused code by telling LLVM which functions make up the public API. We ask it to “internalize” every function that isn’t in the list, and to remove code that isn’t reachable from a public function (the “dce” in “globaldce” stands for “dead-code elimination”).

opt \
  --internalize-public-api-list=nickel_context_alloc,... \
  --passes='internalize,globaldce' \
  -o small.bc \
  merged.bc

Finally, we recompile the result back into an object file and pop it into a static library. llc turns the LLVM bitcode back into machine code, so the resulting object file can be consumed by non-LLVM toolchains.

llc --filetype=obj --relocation-model=pic small.bc -o small.o
ar rcs libsmaller_nickel_lang.a small.o

The result is a 19MB static library, pretty much the same as the other workflow. Note that we don’t need the section-merging step here, because we didn’t ask llc to generate a section per function.

Dragonfire

Soon after drafting this post, I learned about dragonfire, a recently-released and awesomely-named tool for shrinking collections of static libs by pulling out and deduplicating object files. I don’t think this post’s techniques can be combined with theirs for extra savings, because you can’t both deduplicate and merge object files (I guess in principle you could deduplicate some and merge others, if you have very specific needs.) But it’s a great read, and I was gratified to discover that someone else shared my giant-Rust-static-library concerns.

Conclusion

We saw two ways to significantly reduce the size of a static library, one using classic tools like ld and objcopy and another using LLVM-specific tools. They both produced similar-sized outputs, but as with everything in life there are some tradeoffs. The “classic” bintools approach works with both GNU bintools and LLVM bintools, and it’s significantly faster — a few seconds, compared to a minute or so — than the LLVM tools, which need to recompile everything from the intermediate representation to machine code. Moreover, the bintools approach should work with any static library, not just one compiled with a LLVM-based toolchain.

On the other hand, the LLVM approach works on MacOS (and Linux, Windows, and probably others). For this reason alone, this is the way we’ll be building our static libraries for Nickel.

Namely, the library API for Nickel, which is going to have a stable embedding API real soon now, including bindings for C and Go!↩
Go expects packages with pre-compiled dependencies to check the compiled code directly into a git repository.↩
technically “symbols”, not “functions”. But for this abbreviated discussion, the distinction doesn’t matter.↩
Or not. To improve parallelization, Rust sometimes generates multiple object files per crate.↩
if you’ve turned on link-time optimization↩
Linux distributions that use LTO seem to agree that this intermediate representation should be stripped before distributing the library.↩
We have the LLVM intermediate representation because we build with LTO. If you aren’t using LTO then there are probably other ways to get it, like with Rust’s --emit-llvm-ir flag.↩

Behind the scenes

Joe Neeman

Joe is a programmer and a mathematician. He loves getting lost in new topics and is a fervent believer in clear documentation.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.