Every time I end up using repos with submodules, I'm stuck in the submodule unwedging dance at some point. It's just not worth it. Either it's stuck with changes that accidentally snuck into the subtree (rm -rf and submodule init/update), the commit is bad and git can't update to it somehow (maybe it's not a tag and can't be fetched? usually gets fixed with rm -rf and submodule init/update), or maybe it's just when I switched branches and it's left in a dirty state because... reasons.<p>Git is elegant is so many ways, but submodules are the broken, ugly stepchild of the beauty that git is.<p>I suspect it's not the idea of submodules, but the terrible, terrible command interface to them and how badly they work with the rest of the system.
Not only are they “a pain” but I see git submodules as the technical equivalent of a “Big Tent” political party trying to fit mutiple divergent factions into a single entity.<p>At some point someone in a submodule repo is going to go rogue and make something incompatible with the bigger picture. Psychologically, they’ve merged to their trunk so as far as they are concerned their change is complete. Task done, “mission accomplished”. Yet until it’s also in the parent repo’s trunk, it might as well still be on a branch.<p>In a monorepo, the rogue coder would instead still be on a branch. They’re done when their branch works <i>and</i> their change is merged (er, ff’d!) onto the one true main, not before.<p>None of this applies to FOSS projects. Those <i>are</i> Big Tents where political negotiations are constantly required to keep various democratically equal projects aligned.<p>In the corporate world though I am entirely happy with a one party state running a top down, planned economy of N year plans following CEO-Thought, and with a single repository.
In trunk based development libraries do break product code, yes, which is (and should be recognized this way) 100% intentional. The other alternative is leaf based development where you have dependency hell because you can't force product code to update libraries.<p>Pick your poison - change management is not an easy topic.
I don't get the opposition to submodule. imho it's fine, there are limited options with it, so limited chance it may go wrong.<p>some suggested package manager, which imho is another layer of dependency, e.g. in Python, it always takes me some extra seconds to figure out which package manager I'm supposed to use, and it's constantly evolving.<p>end of day it's up to which one you are most familiar with.
My impression of this opposition (and hate) is that people don't read the docs and expect submodules to bend to their vision of how submodules should work. Because of that, we have multiple similar but not the same "submodule" implementations: git-submodule, git-subtree, git-subrepo...
I think one problem is that submodules are kind of over-kill in situations where what you really want is to have separate repos for various reasons but one way of pulling them all in at once. You could make a script with the origins etc but then you put than in a repo and then ...<p>I use them in my personal projects and have a sprawling mass of crazy. It "works" but it's a pain and I have scripts to recursively crawl through and basically do 'git pull' everywhere or 'git push'.<p>So if you end up using submodules when you don't really care about <i>versioning</i> the sub-repos, I think you end up feeling a bit of an idiot like me. But it's a pain to reverse.
Submodules are the feature you discover someone pulled in when things don't work, and after a bit of digging, you realize it's because the submodule wasn't initialized, and it didn't say so anywhere.<p>I suspect the opposition to submodules comes from the poor (manual) integration.<p>As someone suggested in another thread, they ought to auto-recurse by default.<p>But I don't think that's going to happen, either.<p>Maybe something like direnv where it tells you about the unloaded submodule once you enter the repository. Except that won't work for IDEs.
Same here. I used it and set it up once and had no issues personally, but only after documenting how to deal with it exactly.<p>It's just a rare feature so people don't know how to deal with it, so they just hate it. Once you figure out the 5 commands you will need you're golden.
I love submodules. It's similar to using a package manager but allows you to modify the submodules' code much easier. And as already mentioned you need to use `git config --global submodule.recurse true`. This really should be the default.
Submodules are <i>okay</i> in theory. But in practice the actual implementation is very buggy and incomplete. It's relatively easy to get into a state where your .git directory is completely broken. Plenty of operations are unreliable to the point that they break CI. They don't work with worktrees.<p>On top of that they are needlessly confusing. Why is there a .gitmodules file <i>and</i> hidden state inside the .git directory? Why aren't they cloned by default? Many of the UX issues have only been fixed if you turn non-default options on (e.g. the display of diffs can be changed from useless "submodules changed from commit 123 to 456" to "these commits have been added/removed").<p>Just all-round they are a mediocre idea, implemented badly.
Absent mindedly moving a submodule breaks everything. It was relatively recently there was even support via git mv.
After running the platform org at a company that used submodules: never again.<p>There was not a week that went by without me having to unstick some team that had horribly managed to screw things up because of them or watch an engineer burn an entire day fighting with them or watch a new hire completely confused.<p>"Well if everyone would just ..."<p>Everyone is not going "to just". If your system relies on everyone inherently having the same understanding of the world and behaving in the same way, then it's a terrible system.
I like Git Submodules very much.<p>Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?<p>I don't mean code by 3rd parties. I mean the projects and libraries I write myself. I am tempted to try and handle my own code-reuse purely through git submodules. Would I encounter any problems?
> Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?<p>Submodules suck once you encounter a diamond dependency problem.
It doesn’t even need to be a diamond to have dependency problems. Suppose I have two dependencies, A and B. Dependency A also depends on B. Now I have two copies of B, one from my submodule and one from A’s submodule, and there’s no guarantee that these are compatible versions.
If your projects and libraries are public, your package manager will likely let you use Git references (usually commits/branches/tags) to define dependencies. I find that cleaner than submodules.
We used them for years for dependencies, but have instead moved to a monorepo because of how we want to do releases. We never had any issues while using submodules.
I am not sure what exactly are you trying to achieve, but take a look at Meson Wraps.
This is the article I should have read when first trying to use git submodules. The two main facts "a submodule is another git repository" and "a submudule is always pinned to a specific commit of the other git repository" are the most important things to understand git submodules. Somehow all tutorials/examples that I saw before show lots of git commands and their outputs but do not highlight the two basic facts so they actually do not help.
Hey, I wrote this article. Just wanted to really thank you for saying that. I also felt like "why didn't anyone tell me this" so I wanted to share it :-)
Type `man 7 gitsubmodules` into your terminal.<p>Edit: typo'ed command
For C++, I've started using FetchContent as a kind of distributed package manager. When I think about it, why is that better than git submodules?
I’ve used submodules a lot, and choose not to use them, anymore. They are just too much work. I now use package managers to accomplish pretty much the same thing.<p>One project that I wrote, used nested submodules. There was a specific reason for the nesting, as it was a layered system, and each layer had a very specific context and functional domain, and submodules helped enforce that.<p>The problem was, it made changes a <i>huge</i> pain. If I made changes in the deeper layers, I’d need to propagate the changes throughout the entire chain, above. I wrote a few bash scripts to handle that, but it was fairly kludgy, and quite brittle.<p>I ended up just folding it all into a monorepo.<p>The one feature that I’d <i>love</i> to see in git, is something that Microsoft SourceSafe could do. I call it “Virtual Repos.”<p>You could make a “repo” that was actually an amalgam of files that were references to files in other repos. Their state in the virtual repo reflected their state in their “home” repo, and changes made in the virtual repo would go out to their home repo.<p>It must be a nightmare to get right, though. I can see why it would not be implemented, but these could be used for a lot of the same things that submodules are used for.
Checkout git subrepo [0] if you also find working with submodules cumbersome.<p>It has a different set of trade offs and works without any problems or changes to your workflow if they fit. (Only thing it has problems is rebasing, under specific circumstances)<p>[0] <a href="https://github.com/ingydotnet/git-subrepo">https://github.com/ingydotnet/git-subrepo</a>
imho: do not use submodules. Only need to deal with them when working on someone else's project where they didn't know this rule.
The very short of submodules is this:<p>Git's data system has 3 (& 1/2) types of objects:<p>1. A blob, which is analogous to a file, and is referenced by the hash of its contents<p>2. A tree, which is analogous to a directory, which contains blobs and other trees, and maps them to names. It is referenced by the hash of its contents.<p>3. A commit, which contains One (1) tree (the top-level of your repo), a reference to one (or more, for a merge) parent commit, and miscellaneous metadata like the author and the commit message. It is referenced by, you guessed it, the hash of its contents! (annotated tags are commit objects)<p>3.5. References, which are analogous to symlinks to commits. Branches and lightweight (non-annotated) tags are References.<p>Now, remember how a tree can contain a blobs or other trees? What if (<i>gasp</i>) you put a <i>commit</i> object in them!? That's essentially what a submodule is.<p>That's why a submodule is always included _at a particular commit of it_. That's why there's all sorts of complicated support machinery to make "a commit object inside a tree object" make sense.
Git - 100+ flags and commands when you only need 3-5.
Actually it's backwards. Git gives ability to manage and navigate commit graph on quite low level using a pair of commands: checkout and reset and fulfill any wild desires. While in other VCSs it's a separate command per case.<p>Note: I think GIT UX is horrible and requires multiple years of practice to be comfortable with.
Most of the time, you only need a handful of commands, but there's a long tail of niche situations, especially if you are using git for maintaining a large project like the Linux kernel.<p>Remember, git was designed and written for the kernel first and foremost.
Can anyone speak to usecases for submodules that arent better served by your language’s package manager? Multi-language codebases, languages without appropriate package management perhaps?
Submodules are great for projects where your code depends on upstream Git repos that you don't control and don't want to vendor yourself.<p>I recently did an embedded Linux design that depended on 5 external repositories: one from Yocto, three from OpenEmbedded, and one from a CPU vendor. My own code just sat on the top of this set of repos.<p>Submodules made that design very simple. One repo with all of my code in it, and submodules for all external repos. All dependent repos were pinned before of how submodules work. Pins were easy to update when desired, and never move on their own.
We use a submodule in <a href="https://github.com/uber/h3-py">https://github.com/uber/h3-py</a> to wrap the core H3 library, which is written in C. Submodules seemed like a reasonable way to handle the dependency, and, at least for this use case, the approach hasn't given me any problems.
The latter had been an issue for me in the past with some projects that just weren't packaged for, e. G., python and had to be imported directly. It can also be helpful for non-packaged assets that are held in a separate git repository.
> Multi-language codebases<p>This is where I use them. I have some Rust bindings to C++ code, and that C++ code lives in my repo as a submodule. Everyone seems to hate submodules I guess because of the surprising behavior described in the post, but for my use case they've been completely fine.
I have used them for OpenAPI specs shared between frontend and backend, and database schemas together with values for a test database in the pipeline.<p>Yes, you can also solve this with a monorepo...
Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable. The default pack file behavior of git is completely unsuitable for a rapidly releasing company with a monorepo. You <i>don’t</i> want or need the entire history — you just want a few recent commits. You <i>do</i> want some visibility into what your coworkers are up to so you can prevent merge conflicts before they happen (centralization is good!). You <i>probably</i> only need part of the tree, not the entire thing.<p>If you go back and watch Linus’ talk at Google regarding git, he’s basically describing (unknowingly) why Google needs to <i>not</i> use git for its day to day. Even on a smaller scale, Android (AOSP) had to create a meta tool for git called git-repo to handle its source tree. Git submodule failed there.
> Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable.<p>Where did you get that from? Sources?<p>Google rolled their own VCS, because Google is older than Git, and they needed something that works. Their custom VCS is a hacked up version of Perforce.<p>By the time Git came around, Google was already pretty much committed to their in-house custom tool, too many things relied on it.
Don't all your examples predate stable git-submodules?
Every "language package manager" with a lock file format and requirements file, is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.<p>Almost every use case for a package manager is better served by Git, whether you choose to use submodules or not. If you want to do version control, use the version control system, and stop trying to do an end-run around the way it works.<p>Previously:<p>> <i>I'm happy to criticize NPM the tool. The whole thing is designed as a second, crummier version control system that lives in disharmony with and on top of your base-level version control system (so it can subvert it). It's a terrible design.</i><p><<a href="https://news.ycombinator.com/item?id=37604551">https://news.ycombinator.com/item?id=37604551</a>>
I am running simulations with a rapidly evolving codebase. I have a separate repo with all the simulation code in it. I am want to tie each simulation with the git commit (of the main repo) at which it was run. Are git submodules the correct solution to this in any way?
Does Meta sapling git wrapper facilitates working with submodules?
understanding submodules has not caused me to stop wishing that something in the vein of nix (in the sense of being able to provide a "lockfile" that transcends language-level package managers) becomes sufficiently commonplace that people would feel silly doing anything other than using whatever that turns out to be, or just directly vendoring if all else fails
speaking of submodules, anyone here have experience with git subrepo ?
I recommend trying subtree. for many cases it does what you need . And the other consumers (developers, build systems) don't need any special tooling.
[dead]