Every time I end up using repos with submodules, I'm stuck in the submodule unwedging dance at some point. It's just not worth it. Either it's stuck with changes that accidentally snuck into the subtree (rm -rf and submodule init/update), the commit is bad and git can't update to it somehow (maybe it's not a tag and can't be fetched? usually gets fixed with rm -rf and submodule init/update), or maybe it's just when I switched branches and it's left in a dirty state because... reasons.<p>Git is elegant is so many ways, but submodules are the broken, ugly stepchild of the beauty that git is.<p>I suspect it's not the idea of submodules, but the terrible, terrible command interface to them and how badly they work with the rest of the system.
I don't get the opposition to submodule. imho it's fine, there are limited options with it, so limited chance it may go wrong.<p>some suggested package manager, which imho is another layer of dependency, e.g. in Python, it always takes me some extra seconds to figure out which package manager I'm supposed to use, and it's constantly evolving.<p>end of day it's up to which one you are most familiar with.
Submodules are the feature you discover someone pulled in when things don't work, and after a bit of digging, you realize it's because the submodule wasn't initialized, and it didn't say so anywhere.<p>I suspect the opposition to submodules comes from the poor (manual) integration.<p>As someone suggested in another thread, they ought to auto-recurse by default.<p>But I don't think that's going to happen, either.<p>Maybe something like direnv where it tells you about the unloaded submodule once you enter the repository. Except that won't work for IDEs.
My impression of this opposition (and hate) is that people don't read the docs and expect submodules to bend to their vision of how submodules should work. Because of that, we have multiple similar but not the same "submodule" implementations: git-submodule, git-subtree, git-subrepo...
I think one problem is that submodules are kind of over-kill in situations where what you really want is to have separate repos for various reasons but one way of pulling them all in at once. You could make a script with the origins etc but then you put than in a repo and then ...<p>I use them in my personal projects and have a sprawling mass of crazy. It "works" but it's a pain and I have scripts to recursively crawl through and basically do 'git pull' everywhere or 'git push'.<p>So if you end up using submodules when you don't really care about <i>versioning</i> the sub-repos, I think you end up feeling a bit of an idiot like me. But it's a pain to reverse.
Same here. I used it and set it up once and had no issues personally, but only after documenting how to deal with it exactly.<p>It's just a rare feature so people don't know how to deal with it, so they just hate it. Once you figure out the 5 commands you will need you're golden.
Not only are they “a pain” but I see git submodules as the technical equivalent of a “Big Tent” political party trying to fit mutiple divergent factions into a single entity.<p>At some point someone in a submodule repo is going to go rogue and make something incompatible with the bigger picture. Psychologically, they’ve merged to their trunk so as far as they are concerned their change is complete. Task done, “mission accomplished”. Yet until it’s also in the parent repo’s trunk, it might as well still be on a branch.<p>In a monorepo, the rogue coder would instead still be on a branch. They’re done when their branch works <i>and</i> their change is merged (er, ff’d!) onto the one true main, not before.<p>None of this applies to FOSS projects. Those <i>are</i> Big Tents where political negotiations are constantly required to keep various democratically equal projects aligned.<p>In the corporate world though I am entirely happy with a one party state running a top down, planned economy of N year plans following CEO-Thought, and with a single repository.
In trunk based development libraries do break product code, yes, which is (and should be recognized this way) 100% intentional. The other alternative is leaf based development where you have dependency hell because you can't force product code to update libraries.<p>Pick your poison - change management is not an easy topic.
I love submodules. It's similar to using a package manager but allows you to modify the submodules' code much easier. And as already mentioned you need to use `git config --global submodule.recurse true`. This really should be the default.
Submodules are <i>okay</i> in theory. But in practice the actual implementation is very buggy and incomplete. It's relatively easy to get into a state where your .git directory is completely broken. Plenty of operations are unreliable to the point that they break CI. They don't work with worktrees.<p>On top of that they are needlessly confusing. Why is there a .gitmodules file <i>and</i> hidden state inside the .git directory? Why aren't they cloned by default? Many of the UX issues have only been fixed if you turn non-default options on (e.g. the display of diffs can be changed from useless "submodules changed from commit 123 to 456" to "these commits have been added/removed").<p>Just all-round they are a mediocre idea, implemented badly.
Absent mindedly moving a submodule breaks everything. It was relatively recently there was even support via git mv.
I like Git Submodules very much.<p>Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?<p>I don't mean code by 3rd parties. I mean the projects and libraries I write myself. I am tempted to try and handle my own code-reuse purely through git submodules. Would I encounter any problems?
> Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?<p>Submodules suck once you encounter a diamond dependency problem.
It doesn’t even need to be a diamond to have dependency problems. Suppose I have two dependencies, A and B. Dependency A also depends on B. Now I have two copies of B, one from my submodule and one from A’s submodule, and there’s no guarantee that these are compatible versions.
We used them for years for dependencies, but have instead moved to a monorepo because of how we want to do releases. We never had any issues while using submodules.
I am not sure what exactly are you trying to achieve, but take a look at Meson Wraps.
If your projects and libraries are public, your package manager will likely let you use Git references (usually commits/branches/tags) to define dependencies. I find that cleaner than submodules.
The very short of submodules is this:<p>Git's data system has 3 (& 1/2) types of objects:<p>1. A blob, which is analogous to a file, and is referenced by the hash of its contents<p>2. A tree, which is analogous to a directory, which contains blobs and other trees, and maps them to names. It is referenced by the hash of its contents.<p>3. A commit, which contains One (1) tree (the top-level of your repo), a reference to one (or more, for a merge) parent commit, and miscellaneous metadata like the author and the commit message. It is referenced by, you guessed it, the hash of its contents! (annotated tags are commit objects)<p>3.5. References, which are analogous to symlinks to commits. Branches and lightweight (non-annotated) tags are References.<p>Now, remember how a tree can contain a blobs or other trees? What if (<i>gasp</i>) you put a <i>commit</i> object in them!? That's essentially what a submodule is.<p>That's why a submodule is always included _at a particular commit of it_. That's why there's all sorts of complicated support machinery to make "a commit object inside a tree object" make sense.
I’ve used submodules a lot, and choose not to use them, anymore. They are just too much work. I now use package managers to accomplish pretty much the same thing.<p>One project that I wrote, used nested submodules. There was a specific reason for the nesting, as it was a layered system, and each layer had a very specific context and functional domain, and submodules helped enforce that.<p>The problem was, it made changes a <i>huge</i> pain. If I made changes in the deeper layers, I’d need to propagate the changes throughout the entire chain, above. I wrote a few bash scripts to handle that, but it was fairly kludgy, and quite brittle.<p>I ended up just folding it all into a monorepo.<p>The one feature that I’d <i>love</i> to see in git, is something that Microsoft SourceSafe could do. I call it “Virtual Repos.”<p>You could make a “repo” that was actually an amalgam of files that were references to files in other repos. Their state in the virtual repo reflected their state in their “home” repo, and changes made in the virtual repo would go out to their home repo.<p>It must be a nightmare to get right, though. I can see why it would not be implemented, but these could be used for a lot of the same things that submodules are used for.
imho: do not use submodules. Only need to deal with them when working on someone else's project where they didn't know this rule.
Checkout git subrepo [0] if you also find working with submodules cumbersome.<p>It has a different set of trade offs and works without any problems or changes to your workflow if they fit. (Only thing it has problems is rebasing, under specific circumstances)<p>[0] <a href="https://github.com/ingydotnet/git-subrepo">https://github.com/ingydotnet/git-subrepo</a>
After running the platform org at a company that used submodules: never again.<p>There was not a week that went by without me having to unstick some team that had horribly managed to screw things up because of them or watch an engineer burn an entire day fighting with them or watch a new hire completely confused.<p>"Well if everyone would just ..."<p>Everyone is not going "to just". If your system relies on everyone inherently having the same understanding of the world and behaving in the same way, then it's a terrible system.
This is the article I should have read when first trying to use git submodules. The two main facts "a submodule is another git repository" and "a submudule is always pinned to a specific commit of the other git repository" are the most important things to understand git submodules. Somehow all tutorials/examples that I saw before show lots of git commands and their outputs but do not highlight the two basic facts so they actually do not help.
Hey, I wrote this article. Just wanted to really thank you for saying that. I also felt like "why didn't anyone tell me this" so I wanted to share it :-)
Type `man 7 gitsubmodules` into your terminal.<p>Edit: typo'ed command
I recommend trying subtree. for many cases it does what you need . And the other consumers (developers, build systems) don't need any special tooling.
Can anyone speak to usecases for submodules that arent better served by your language’s package manager? Multi-language codebases, languages without appropriate package management perhaps?
Submodules are great for projects where your code depends on upstream Git repos that you don't control and don't want to vendor yourself.<p>I recently did an embedded Linux design that depended on 5 external repositories: one from Yocto, three from OpenEmbedded, and one from a CPU vendor. My own code just sat on the top of this set of repos.<p>Submodules made that design very simple. One repo with all of my code in it, and submodules for all external repos. All dependent repos were pinned before of how submodules work. Pins were easy to update when desired, and never move on their own.
We use a submodule in <a href="https://github.com/uber/h3-py">https://github.com/uber/h3-py</a> to wrap the core H3 library, which is written in C. Submodules seemed like a reasonable way to handle the dependency, and, at least for this use case, the approach hasn't given me any problems.
The latter had been an issue for me in the past with some projects that just weren't packaged for, e. G., python and had to be imported directly. It can also be helpful for non-packaged assets that are held in a separate git repository.
> Multi-language codebases<p>This is where I use them. I have some Rust bindings to C++ code, and that C++ code lives in my repo as a submodule. Everyone seems to hate submodules I guess because of the surprising behavior described in the post, but for my use case they've been completely fine.
Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable. The default pack file behavior of git is completely unsuitable for a rapidly releasing company with a monorepo. You <i>don’t</i> want or need the entire history — you just want a few recent commits. You <i>do</i> want some visibility into what your coworkers are up to so you can prevent merge conflicts before they happen (centralization is good!). You <i>probably</i> only need part of the tree, not the entire thing.<p>If you go back and watch Linus’ talk at Google regarding git, he’s basically describing (unknowingly) why Google needs to <i>not</i> use git for its day to day. Even on a smaller scale, Android (AOSP) had to create a meta tool for git called git-repo to handle its source tree. Git submodule failed there.
> Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable.<p>Where did you get that from? Sources?<p>Google rolled their own VCS, because Google is older than Git, and they needed something that works. Their custom VCS is a hacked up version of Perforce.<p>By the time Git came around, Google was already pretty much committed to their in-house custom tool, too many things relied on it.
Git submodules aren't really intended for that use-case in the first place. They're not really intended to model a mono-repo at all, more a relationship between repositories that have their own histories.<p>The main thing that has been developed in git to allow very large repos is shallow clones (both in terms of history and slices of the repo). This model works well enough within git's logic, but it's just historically not been focused on until fairly recently (and I don't really know what the state of play is there - I think there's still a limit at a certain scale where simply finding the state of play of a large checkout becomes a bottleneck, and you start to want a persistent daemon to use FS notifications to keep track of what's changed instead of stat()ing every file in the tree)<p>(I've often pondered if it would be possible to make a DVCS where there's no firm repo boundary at all, i.e. you could construct a checkout from any combination of trees and commits stored in different locations, and have it work seamlessly. There's probably more than a few thorny issues in there, but it would be an interesting concept)
Don't all your examples predate stable git-submodules?
Every "language package manager" with a lock file format and requirements file, is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.<p>Almost every use case for a package manager is better served by Git, whether you choose to use submodules or not. If you want to do version control, use the version control system, and stop trying to do an end-run around the way it works.<p>Previously:<p>> <i>I'm happy to criticize NPM the tool. The whole thing is designed as a second, crummier version control system that lives in disharmony with and on top of your base-level version control system (so it can subvert it). It's a terrible design.</i><p><<a href="https://news.ycombinator.com/item?id=37604551">https://news.ycombinator.com/item?id=37604551</a>>
I have used them for OpenAPI specs shared between frontend and backend, and database schemas together with values for a test database in the pipeline.<p>Yes, you can also solve this with a monorepo...
Git - 100+ flags and commands when you only need 3-5.
Actually it's backwards. Git gives ability to manage and navigate commit graph on quite low level using a pair of commands: checkout and reset and fulfill any wild desires. While in other VCSs it's a separate command per case.<p>Note: I think GIT UX is horrible and requires multiple years of practice to be comfortable with.
Most of the time, you only need a handful of commands, but there's a long tail of niche situations, especially if you are using git for maintaining a large project like the Linux kernel.<p>Remember, git was designed and written for the kernel first and foremost.
But it's used widely in projects that are not the Linux kernel. So what should they do, fork Git and create a LITE version? :)
Also, do you have examples of niche situations?
I am running simulations with a rapidly evolving codebase. I have a separate repo with all the simulation code in it. I am want to tie each simulation with the git commit (of the main repo) at which it was run. Are git submodules the correct solution to this in any way?
For C++, I've started using FetchContent as a kind of distributed package manager. When I think about it, why is that better than git submodules?
understanding submodules has not caused me to stop wishing that something in the vein of nix (in the sense of being able to provide a "lockfile" that transcends language-level package managers) becomes sufficiently commonplace that people would feel silly doing anything other than using whatever that turns out to be, or just directly vendoring if all else fails
speaking of submodules, anyone here have experience with git subrepo ?
Does Meta sapling git wrapper facilitates working with submodules?
[dead]