We have previously introduced FawltyDeps, a tool to help Python projects avoid the dreaded, and seemingly unavoidable state, where dependencies declared in the configuration do not match those actually imported in the code1. FawltyDeps is the perfect addition to your CI, your pre-commit hooks, or your dependency management arsenal.
Curious to know how FawltyDeps works its magic? In this sequel we’ll delve into an essential component of FawltyDeps: how it matches imports and dependencies behind the scenes, and why it is important to get this matching right.
We’ve been busy working on an improved mapping strategy that combines versatility with simplicity, and we have come a long way from the quite limited version we presented in our first announcement. By the end of this post, you’ll have a solid understanding of FawltyDeps’ brand new mapping options and how to tailor them to your project’s unique context and needs.
Simply put, FawltyDeps extracts imports from your code, and dependencies declared in your project configuration, and matches them against each other:
- the imports that are not present in your declared dependencies are reported as undeclared dependencies
- the declared dependencies that are not imported in your code are reported as unused dependencies.
When matching imports and dependencies, we first assume that a dependency (specifically: the package it references) and an import have the same name. This approximation works well for many Python packages.
numpy is a good example: in your code, you write
import numpy, and to install it you run
pip install numpy, or you list
numpy in your
requirements.txt (or wherever you list your project dependencies).
Problem solved! So why are we even writing this post?
It turns out that, as always, things are not that simple™. Many packages provide import names that are different from the package name. For example:
- You depend on the
pyyamlpackage, but you import
yaml(as seen in Figure 1).
- You depend on the
scikit-learnpackage, but you import
- You depend on the
setuptoolspackage, but you import either
pkg_resourcesor some other import, as
setuptoolsexposes multiple imports.
Clearly our first approximation (hereafter referred to as the identity mapping) is not good enough. To solve this, we need a smarter mapping: a way to figure out which packages correspond to which imports. In practice, there are a few different ways to acquire these mappings, each having its advantages and limitations. Our main goal here is to lay out the mappings we support in FawltyDeps, and explain how they can be used individually or together to resolve packages into their respective imports.
Arguably, the only correct way for FawltyDeps to match packages to imports is to actually ask each package what imports it provides. FawltyDeps can do this3, but it first needs to find where the packages are installed, and that turns out to be more complicated than one might think.
In the first versions of FawltyDeps, we had not yet properly drilled into this issue. Instead, we only looked at the Python environment in which FawltyDeps itself was already running, and we simply assumed that your project dependencies should be installed into the same environment 4. If a dependency of your project was not found in this environment, we would fall back to the identity mapping.
This meant simply pushing the problem onto the user, however, and making FawltyDeps harder to use. What we wanted instead was for FawltyDeps to resolve the dependencies wherever they may be installed. This is where things can get very complicated: In general, there is a bewildering variety of ways to install dependencies in the Python world.
We are not going to open the entire Pandora’s box of Python packaging and dependency management in this blog post, except as to note some different examples of where Python packages (specifically: your project’s 3rd-party dependencies) can typically be found:
- System-wide package locations, like those found under
/usr/local/lib/python*(whether installed by your system’s package manager or system-wide
- User-specific packages, installed by tools like
pip install --user.
- Virtual environments (from
virtualenv, Poetry, PDM, etc.), located either within your project, or somewhere else.
- Other, less common, methods or locations5 that resemble any of the above.
We would like to have FawltyDeps work with as many of these as possible, and furthermore, when it’s possible: to have FawltyDeps automatically discover and use them by default.
As of v0.13.0 we have come a long way towards realizing this vision: We support the kinds of Python environments mentioned above (for FawltyDeps’ purpose, a “Python environment” really means any directory in which Python packages could be installed), and the following diagram outlines how FawltyDeps determines which Python environments are used to look up the project’s dependencies:
In other words:
--pyenvoption lets you point to one or more Python environments. All of these environments will be used when matching dependencies to imports
--pyenvis not used, FawltyDeps will automatically find and use Python environments that exist within your project directories (i.e. within any directory that is passed as a positional argument to FawltyDeps, aka. “basepath”, or the current directory by default).
- If no Python environment is found by the two methods above, FawltyDeps will fall back to using the environment in which it’s running.
There is still some way to go until all the details are perfect here6, but we believe this approach covers most common cases well.
There is an elephant in the room that we have not yet talked about: Sometimes you may be running FawltyDeps on a project where the project dependencies are not installed at all! Then what can you do? (Assuming that you don’t want to go through the bother of installing packages manually.) Until recently FawltyDeps would simply fall back to the identity mapping for any packages that it could not find locally, with the undeclared/unused report provided by FawltyDeps suffering as a result.
With the new
--install-deps option introduced in v0.13.0, we are now able to provide a better alternative: With this option FawltyDeps will not fall back to the identity mapping, instead it will automatically use
pip install to install the unresolved dependencies (from PyPI, by default7) into a temporary virtualenv8, and it will then use this as an additional source for the dependency-to-import mapping. For dependencies that are not found locally, this allows FawltyDeps to come up with the correct mapping (and hence produce a much better undeclared/unused report) rather than relying on the imperfect identity mapping.
Since this is a potentially expensive strategy we have chosen to hide it behind the
--install-deps command-line option. If you want to always enable this option, you can set the corresponding
install_deps configuration variable to
true in the
[tool.fawltydeps] section of your
Note that there is no guarantee that we’re able to resolve all dependencies with this method: For example, there could be a typo in your declared dependency that means it will never be found on PyPI, or there could be other circumstances (e.g. network issues) that prevent this strategy from working at all. What happens with such unresolved dependencies will be covered below.
The mappings discussed above have FawltyDeps look into packages that are actually installed (whether in an existing local environment or temporarily by FawltyDeps). But this might not always be achievable in practice. You might want to run FawltyDeps in your CI, possibly on multiple libraries, without having to either set up a local environment or access packages from outside sources (like PyPI).
A simple solution to this is to provide FawltyDeps with your own custom mapping.9 We have chosen not to ship any database with the code as it needs to be frequently updated, with no guarantee of it covering all Python packages. Instead, we allow users to provide their own custom TOML mapping. This mapping does not have to be complete and it can be used in conjunction with the other mappings discussed in this article. We talk more about how FawltyDeps combines different mappings in the following section.
Now that we have gathered all these mappings, let’s see how to best combine them.
Overall, we have three guiding principles in this endeavor:
- Completeness: we should be able to resolve all dependencies extracted from a project into associated import names, as otherwise we cannot reach any conclusions about undeclared or unused dependencies.
- Correctness: some mappings offer a higher level of correctness than others. Identity mapping, for example, is correct for many - but certainly not all - packages. Resolving a dependency via a locally installed package offers a higher guarantee of correctness.
- Transparency: we should be able to trace back what mapping was used to resolve any given dependency. This allows users to discover where they may improve the information passed to FawltyDeps (e.g. using
--pyenvto point at the most appropriate Python environments). It also makes it much easier for us to diagnose where FawltyDeps itself might be improved.
First, let’s start by repeating our available strategies:
- Identity mapping: The simplest strategy, but also the worst. We would like to avoid using it as much as possible.
- Looking at locally installed packages: Our best option in terms of correctness, but not always complete: sometimes we have to concede that not all dependencies are available in a local Python environment, so we still need a fallback strategy.
- Installing packages (from PyPI) into a temporary virtualenv: The ultimate fallback solution, but quite heavy-weight, and not always suitable (e.g. in a restricted CI environment). Hence, we put this behavior behind the
- Custom/user-defined mapping: Allow the user to have the final say in how dependencies are mapped into imports. This strategy should override the other strategies, but we expect few users will want to go through the fuss of defining their own mapping, so we cannot rely on this being used commonly.
Now, we need to figure out how to combine these strategies in the best way.
We have chosen to organize them in the sequence shown in Figure 3 below. Each strategy - when given the name of a dependency - can either return a successful mapping of that dependency name (into a corresponding set of import names), or return nothing (when a dependency is not found by that strategy). Dependencies that are not resolved by a strategy are passed onto the next strategy in the sequence. Since a dependency is mapped by only one strategy, that is, the first that returns something, we need to organize our strategies in order of decreasing preference. In other words:
- The user-defined mapping, when provided, should always override other mappings. It thus comes first in the sequence.
- Next, we want to look at the locally installed packages.
- Finally, if we have not been able to find the dependency in either of the above, we want to use a fallback strategy:
- If the user has enabled
--install-deps, we attempt to install packages (subject to
pipconfiguration, but from PyPI by default). If any of these packages fail to install, we abort the entire process and raise an error, as we do not expect the user wants a further fallback to the inaccurate identity mapping.
- Otherwise, our fallback is the identity mapping, that is, we assume any unresolved dependency points to a package (as yet unseen) that provides a single import of the same name. Although this strategy is always “successful” (in terms of mapping to an import name), it is crucially not always correct!
- If the user has enabled
To bring this back into the overall context of FawltyDeps: once we have resolved the dependencies through the above mapping strategies, we now have an overall mapping of dependency names to provided import names, and this is the basis for the final report:
- Any import found in the project that is not covered by any dependency is reported as an undeclared dependency.
- Any dependency found to only provide imports that are never imported from anywhere is reported as a possibly unused dependency.
The table below provides a summary of the available mappings, sorted in the order FawltyDeps processes them, along with options to customize them.
Provide a custom mapping in TOML format via
Default: No custom mapping
|2||Mapping from installed packages||
Point to one or more environments via
Default: auto-discovery of Python environments under the project’s basepath. If none are found, default to the Python environment in which FawltyDeps itself is installed.
|3a||Mapping via temporary installation of packages||Activated with the
Active by default.
This section dives into some practical scenarios. Suppose you have a simple
numpy>=1.25.0 scikit-learn pyyaml
We assume that these packages are already imported in
import numpy import sklearn import yaml
As we can see, our project has defined all its dependencies as it should, so FawltyDeps should ideally not report any problems. But let’s also assume that we’re running FawltyDeps in an incomplete environment - one where
pyyaml is not installed - to see how this affects FawltyDeps.
When running with default options, like so:
FawltyDeps will run through the default sequence of mappings, as shown in Figure 4:
- No custom mapping is provided.
- FawltyDeps automatically finds local environments or defaults to its own
environment. In this example it finds
numpyin the local environment, and we can see that
scikit-learnis correctly resolved to the
- Identity mapping is used to resolve any dependencies not resolved via previous
mappers. In this example,
pyyamlwas not found above, and was therefore incorrectly resolved by the identity mapping to
The resulting output from FawltyDeps is:
These imports appear to be undeclared dependencies: - 'yaml' These dependencies appear to be unused (i.e. not imported): - 'pyyaml' For a more verbose report re-run with the `--detailed` option.
This first example shows a common pitfall of the identity mapping.
Next, we can see how
--install-deps improves on these situations:
Let’s now take advantage of some advanced FawltyDeps options by running the following command:
fawltydeps --custom-mapping-file my_mapping.toml --pyenv venv --install-deps
Figure 5 shows the path FawltyDeps takes through the sequence of mappings:
- We provide a partial custom mapping. (e.g. via
--custom-mapping-file). In this example, the custom mapping is defined in
scikit-learn = ["sklearn"]
- We point to a local virtual environment (with
--pyenv) where some dependencies are installed. (In this example, only
numpyis installed in
- We pass
--install-deps, to ask FawltyDeps to temporarily install and resolve any remaining dependencies.
FawltyDeps returns the following result:
No undeclared or unused dependencies detected.
As expected, FawltyDeps now returns a better result:
--install-deps option downloads the
pyyaml PyPI package and
makes it available to the resolver, so it can now map the
yaml import to the
pyyaml dependency declaration.
These examples demonstrate two extremes and we expect most usage to fall somewhere in between.
--json flag, the resulting package-to-imports mapping is exposed in the output under the
Using a command like this:
fawltydeps --custom-mapping-file my_mapping.toml --pyenv venv --install-deps --json | jq .resolved_deps
you can see which mappings are used to resolve a package into a set of imports, and further iterate on the mapping options to help FawltyDeps perform its best on your codebase.
FawltyDeps has come a long way from the version we presented in our first announcement. While it was initially limited to resolving packages from its own environment and falling back to the identity mapping, it now supports arbitrary local environments, custom user mappings and it can temporarily install and resolve packages on its own. On top of that, it can also automatically discover virtual environments inside the analyzed project.2
We strive to provide a default behavior that makes sense for most projects, and to offer a customizable yet simple interface for advanced users that wish to take control over the mapping process. We believe the result is a powerful tool that delivers a complete, correct and transparent matching of your project’s dependencies and imports.
The recent publication of Computational reproducibility of Jupyter notebooks from biomedical publications highlights that missing dependencies is a frequent occurrence in repositories hosting scientific computational experiments and has a detrimental effect on reproducibility.↩
This assumption was made no matter whether FawltyDeps was installed in a virtualenv or as part of the system-wide Python installation, and we only documented that FawltyDeps had to be installed into the same environment as your project dependencies. One example of where this did not work out well is when you installed FawltyDeps with
pipx install fawltydeps: This makes
fawltydepsavailable everywhere (via your
$PATH), but pipx installs it into its own, separate, virtualenv that is isolated from your project, meaning that FawltyDeps would almost always fall back to the identity mapping, and yield poor results.↩
Some less common locations of Python packages:
__pypackages__directories (even though PEP582 was recently rejected, these still occur in the wild).
- Conda and other environment managers (not yet explicitly supported, although it’s on our radar).
- Nix closures containing Python packages, like those produced by poetry2nix.
One open issue is that FawltyDeps currently does not look at package versions. This usually does not cause problems in practice, but there are corner cases where it might: Consider, for example, a
package_foothat used to provide two import names
module_b, but starting from version 2, it only provides
module_a. Now, if your project declares a dependency on
package_foo>=2, but you still happen to
import module_bin your code, this should be reported by FawltyDeps as an undeclared dependency (because you’re declaring a dependency on a version of
module_bno longer exists). However, if
package_fooversion 1 (not version 2) happens to be installed in your project’s environment, FawltyDeps will simply believe that
package_foo(whichever version) provides both
module_b, and the error won’t be flagged.↩
Some tools rely on custom mappings. A notable example is the Pants build system, which relies on static mappings provided by the user. Another example is the pipreqs library, which keeps a static database mapping packages to the import names they expose.↩
For completeness, here is an overview of the changes we’ve made to our mapping strategy over the last releases, and that together realize the picture presented in this blog post:
- v0.7 introduces the
--pyenvoption to allow FawltyDeps to look up packages in a different Python environment than the one in which FawltyDeps is running.
- v0.9 adds the user-defined mapping.
- v0.10 adds support for
- v0.11 introduces support for multiple
- v0.12 revamps our project traversal, allowing Python environments to be automatically found inside the project.
- v0.13 introduces the
--install-depsoption allowing missing project dependencies to be mapped correctly instead of using the identity mapping.
- v0.7 introduces the