Package Managers à la Carte: a formal model of dependency resolution

(arxiv.org)

25 points by avsm3 days ago

2 comments

Onavo1 hour ago
I will make it simpler to understand. There is only one thing that make or breaks package resolution: do you support diamond dependencies and when.A diamond dependency is when you have package A depending on package B and C. B depends on package D@v1 while C depends on D@v2. V1 and V2 are incompatible versions of D. This is a classic dependency conflict problem and whether you can resolve it automatically and bundle both packages into the final codebase/binary is the most important architectural decision of the package manager.Package managers/ecosystems that support diamond dependencies in most circumstances:Npm (as long as it's not a peer dep), Golang, Rust, Java/.NET (with shading enabled, it's not turned on by default).With diamond dependency support, in most circumstances you can have arbitrary depth /complexity of dependency resolution.If you don't support diamond dependencies (basically the rest of the world, Python, Ruby, Dart, Elixir, most lisps in their default setup, statically linked C/C++ in default configurations, maybe Zig too, I am not sure about that one), your dependency tree size is severely limited and it becomes a pseudo SAT problem in some cases if you want optimal dependency resolution.This is the core algorithmic and architectural limit on package managers. Almost everything else is just implementation and engineering details. Stuff like centralized vs non centralized repos, package caching proxies, security hashes, chains of trust, vendoring, SLSA/SBOM etc. can all be bolted on as an after thought but supporting conflicting upstream dependencies simultaneously requires compliance on the bundler/transpiler/compiler level.It's also why some languages lend themselves better to tools like Bazel that micromanages every single dependency you have while others do not.
- ryangibb6 minutes ago
 (author of the paper here)My sibling makes a great point about type errors: did you know Cargo (Rust) only supports diamond dependencies where the versions differ only in major version? So you can have exactly the same problem with B depending on D@v1.1 and C depending on D@v1.2 in Cargo. I believe the reason for only supporting concurrent versions with different major versions (to use the paper's parlance) is because packages should have incompatible APIs anyway.> ... and it becomes a pseudo SAT problem in some cases if you want optimal dependency resolutionA couple of clarifications: many dependency resolution algorithms are essentially SAT even if they support concurrent versions (see Cargo). Section 3.3 of the paper might be an interesting read -- it discusses the spectrum of complexity in the problem of dependency resolution, and why some ecosystem's approaches don't work for others. Also, it's generally a 'pseudo SAT problem' (i.e. NP-complete and can be reduced to SAT) to find any valid resolution, not just an optimal one.> This is the core algorithmic and architectural limit on package managers. Almost everything else is just implementation and engineering details.I agree, and that's why the paper focuses on the semantics of dependency expression and dependency resolution! But there's a lot more than concurrent versions in the semantics of how package managers express and resolve dependencies, i.e. features, formula, peer dependencies. The point of the paper is that there's a minimal common core that we can use to translate between package management ecosystems, which we're planning on using to build useful tooling to bridge multilingual dependency resolution.
- jaen53 minutes ago
 The paper does make this distinction under the "Concurrent Versions" property.Allowing concurrent versions though opens you up to either really insidious runtime bugs or impossible-to-solve static type errors.This happens eg. when you receive a package.SomeType@v1, and then try to call some other package with it that expects a package.SomeType@v2. At that point you get undefined runtime behavior (JavaScript), or a static type error that can only be solved by allowing you to import two versions of the same package at the same time (and this gets real hairy real fast).Also, global state (if there is any) will be duplicated for the same package, which generally also leads to very hard-to-discover bugs and undefined behavior.
 - Onavo39 minutes ago
 Good points. Practically speaking though global state is rarely an issue unless it's the underlying framework (hence peer deps).Modern languages are mostly lexically scoped and using primarily global variables for state aside from Singletons has fallen out of favor outside of embedded unless it's a one off script.
- avsm34 minutes ago
 (one of the paper coauthors here)While diamond dependencies are indeed one of the big complicating factors, the implementation and engineering details that remain matter a lot too. Section 4 covers the spectrum of quality-of-life features that do introduce subtleties: for example the order of resolution, peer dependencies, depops/features. These are all important for the ergonomics of package constraint expressions, irrespective of whether diamond dependencies are present or not.The engineering details also flow from the practical implementation constraints: it makes a big difference if solving can done in linear time or if there's a noticeable pause or (worse) you need a big centralised solver. The determinism also guides the implementation of chains of trust.
krbaccord1 hour ago
Geo-tagging even deviations on Street Maps.à la Carte, the formal way is contingent on whether intermediate representation of dependencies, are "enable[d] as translation between distinct package managers."