Demystifying Git Submodules

(cyberdemon.org)

127 points by signa11212 days ago

21 comments

mmastrac212 days ago
Every time I end up using repos with submodules, I'm stuck in the submodule unwedging dance at some point. It's just not worth it. Either it's stuck with changes that accidentally snuck into the subtree (rm -rf and submodule init/update), the commit is bad and git can't update to it somehow (maybe it's not a tag and can't be fetched? usually gets fixed with rm -rf and submodule init/update), or maybe it's just when I switched branches and it's left in a dirty state because... reasons.Git is elegant is so many ways, but submodules are the broken, ugly stepchild of the beauty that git is.I suspect it's not the idea of submodules, but the terrible, terrible command interface to them and how badly they work with the rest of the system.
- crossroadsguy212 days ago
 Having used few “lesser” VCSes I am not so sure I’d call Git elegant. It just became famous and defacto VCS of the Internet - a lot of that credit must go to Github (which also was and is used a lot by comps and teams). Something like Markdown — one day everyone and their kittens were just using it which is not necessarily not a compliment.I had created 2-3 char git submodule related aliases. Git submodule exists because a better alternative isn’t available (which gives us same behaviour or close to it).
 - nine_k211 days ago
 As a side note: git won because at the time all other VCSes were either functionally worse (lice RCS or Subversion), or were good but required a paid license (like Perforce or BitKeeper), or were too slow to for larger projects like Linux (Mercurial).More advanced things were created since then, like Fossil or Pijul. But the network effects make git predominant now.
 - cassianoleal211 days ago
 > But the network effects make git predominant now.That and the fact that it's... well, it's a good system - if perhaps not with the best UX.It works well for pretty much everyone who cares to learn the basics, and then you can evolve from there with more practice. Which is probably true of any system.
 - nine_k211 days ago
 Indeed, git is pretty good internally, despite the clunkiness of the CLI.But, say, Mercurial is also pretty good in many aspects. It used to be rather popular, but its popularity is waning, and not because of some kind of technical inferiority.
 cassianoleal211 days ago
 According to the post I originally responded to:> or were too slow to for larger projects like Linux (Mercurial).So it seems like there were technical reasons for Git vs. Mercurial. I don't really know and having never used Mercurial, I couldn't comment on how good it is or how it compares with Git.From what I read around, it's mostly the UX that's marginally better on the Mercurial side. This is the point where the network effect certainly has weight. If one offering is not better enough than the one people are used to, there is no compelling reason to learn something anew and move all existing projects over.When there is great technical reason, people will move though and the network effect will start moving across. See the previous systems that were popular before Git: Subversion, CVS, ClearCase, etc. Those have mostly been phased out completely, except maybe for older projects that have them ingrained into their processes and technology.
 - Vilian211 days ago
 Perforce wasn't good either, yes bitkeeper was better but hit still managed to make things better, at least for the linux kernel development
 - petepete211 days ago
 Git won because GitHub.
 - nine_k211 days ago
 I've been on a number of projects which used git without GitHub. But of course GitHub like "the default" repository of open-source projects has done a lot to make git the default VCS.
 - hiatus211 days ago
 Have you forgotten sourceforge?
 rileymat2211 days ago
 Sourceforge was terrible by the time Github was ascending.
 petepete210 days ago
 No, unfortunately. It was terrible by comparison.There's no wonder GitHub drew people in. It's interface and ease of building a community around a repo made of so much more accessible than anything else around at the time.
 - guappa211 days ago
 github was a paid service only in the beginning. Which means few were using it.
jimmydoe212 days ago
I don't get the opposition to submodule. imho it's fine, there are limited options with it, so limited chance it may go wrong.some suggested package manager, which imho is another layer of dependency, e.g. in Python, it always takes me some extra seconds to figure out which package manager I'm supposed to use, and it's constantly evolving.end of day it's up to which one you are most familiar with.
- sshine211 days ago
 Submodules are the feature you discover someone pulled in when things don't work, and after a bit of digging, you realize it's because the submodule wasn't initialized, and it didn't say so anywhere.I suspect the opposition to submodules comes from the poor (manual) integration.As someone suggested in another thread, they ought to auto-recurse by default.But I don't think that's going to happen, either.Maybe something like direnv where it tells you about the unloaded submodule once you enter the repository. Except that won't work for IDEs.
- dig1211 days ago
 My impression of this opposition (and hate) is that people don't read the docs and expect submodules to bend to their vision of how submodules should work. Because of that, we have multiple similar but not the same "submodule" implementations: git-submodule, git-subtree, git-subrepo...
- nobodywillobsrv211 days ago
 I think one problem is that submodules are kind of over-kill in situations where what you really want is to have separate repos for various reasons but one way of pulling them all in at once. You could make a script with the origins etc but then you put than in a repo and then ...I use them in my personal projects and have a sprawling mass of crazy. It "works" but it's a pain and I have scripts to recursively crawl through and basically do 'git pull' everywhere or 'git push'.So if you end up using submodules when you don't really care about versioning the sub-repos, I think you end up feeling a bit of an idiot like me. But it's a pain to reverse.
- sureIy211 days ago
 Same here. I used it and set it up once and had no issues personally, but only after documenting how to deal with it exactly.It's just a rare feature so people don't know how to deal with it, so they just hate it. Once you figure out the 5 commands you will need you're golden.
gorgoiler211 days ago
Not only are they “a pain” but I see git submodules as the technical equivalent of a “Big Tent” political party trying to fit mutiple divergent factions into a single entity.At some point someone in a submodule repo is going to go rogue and make something incompatible with the bigger picture. Psychologically, they’ve merged to their trunk so as far as they are concerned their change is complete. Task done, “mission accomplished”. Yet until it’s also in the parent repo’s trunk, it might as well still be on a branch.In a monorepo, the rogue coder would instead still be on a branch. They’re done when their branch works and their change is merged (er, ff’d!) onto the one true main, not before.None of this applies to FOSS projects. Those are Big Tents where political negotiations are constantly required to keep various democratically equal projects aligned.In the corporate world though I am entirely happy with a one party state running a top down, planned economy of N year plans following CEO-Thought, and with a single repository.
- friendzis211 days ago
 In trunk based development libraries do break product code, yes, which is (and should be recognized this way) 100% intentional. The other alternative is leaf based development where you have dependency hell because you can't force product code to update libraries.Pick your poison - change management is not an easy topic.
impure211 days ago
I love submodules. It's similar to using a package manager but allows you to modify the submodules' code much easier. And as already mentioned you need to use `git config --global submodule.recurse true`. This really should be the default.
- IshKebab211 days ago
 Submodules are okay in theory. But in practice the actual implementation is very buggy and incomplete. It's relatively easy to get into a state where your .git directory is completely broken. Plenty of operations are unreliable to the point that they break CI. They don't work with worktrees.On top of that they are needlessly confusing. Why is there a .gitmodules file and hidden state inside the .git directory? Why aren't they cloned by default? Many of the UX issues have only been fixed if you turn non-default options on (e.g. the display of diffs can be changed from useless "submodules changed from commit 123 to 456" to "these commits have been added/removed").Just all-round they are a mediocre idea, implemented badly.
- silasdavis211 days ago
 Absent mindedly moving a submodule breaks everything. It was relatively recently there was even support via git mv.
TekMol211 days ago
I like Git Submodules very much.Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?I don't mean code by 3rd parties. I mean the projects and libraries I write myself. I am tempted to try and handle my own code-reuse purely through git submodules. Would I encounter any problems?
- leni536211 days ago
 > Are there any downsides to completely skip dependency managers of specific languages and just use submodules to handle dependencies?Submodules suck once you encounter a diamond dependency problem.
 - MereInterest211 days ago
 It doesn’t even need to be a diamond to have dependency problems. Suppose I have two dependencies, A and B. Dependency A also depends on B. Now I have two copies of B, one from my submodule and one from A’s submodule, and there’s no guarantee that these are compatible versions.
- skinner927211 days ago
 We used them for years for dependencies, but have instead moved to a monorepo because of how we want to do releases. We never had any issues while using submodules.
- imp0cat211 days ago
 I am not sure what exactly are you trying to achieve, but take a look at Meson Wraps.
- maleldil211 days ago
 If your projects and libraries are public, your package manager will likely let you use Git references (usually commits/branches/tags) to define dependencies. I find that cleaner than submodules.
AceJohnny2212 days ago
The very short of submodules is this:Git's data system has 3 (& 1/2) types of objects:1. A blob, which is analogous to a file, and is referenced by the hash of its contents2. A tree, which is analogous to a directory, which contains blobs and other trees, and maps them to names. It is referenced by the hash of its contents.3. A commit, which contains One (1) tree (the top-level of your repo), a reference to one (or more, for a merge) parent commit, and miscellaneous metadata like the author and the commit message. It is referenced by, you guessed it, the hash of its contents! (annotated tags are commit objects)3.5. References, which are analogous to symlinks to commits. Branches and lightweight (non-annotated) tags are References.Now, remember how a tree can contain a blobs or other trees? What if (gasp) you put a commit object in them!? That's essentially what a submodule is.That's why a submodule is always included _at a particular commit of it_. That's why there's all sorts of complicated support machinery to make "a commit object inside a tree object" make sense.
- IshKebab211 days ago
 > a submodule is always included at a particular commitActually you can make a submodule track a branch instead of a specific commit. I've never seen anyone actually do that though and it seems like a bad idea. Though I did work for a while for a company that had written a custom tool that worked like that and we never ran into any problems due to it.
 - AceJohnny2211 days ago
 I don't think it is possible to put a branch name into a tree object, not without deep modifications to git, so I suspect your previous company developed a significantly different tool around it.
 - IshKebab211 days ago
 My previous company wasn't using native Git submodules.I think you're right actually the submodules. You can associate a submodule with a specific branch, but it still records the hash like normal and you still have to manually update it.
 - eru211 days ago
 What's the hash of a branch?
 - IshKebab211 days ago
 It uses the branch name instead of the hash.
ChrisMarshallNY211 days ago
I’ve used submodules a lot, and choose not to use them, anymore. They are just too much work. I now use package managers to accomplish pretty much the same thing.One project that I wrote, used nested submodules. There was a specific reason for the nesting, as it was a layered system, and each layer had a very specific context and functional domain, and submodules helped enforce that.The problem was, it made changes a huge pain. If I made changes in the deeper layers, I’d need to propagate the changes throughout the entire chain, above. I wrote a few bash scripts to handle that, but it was fairly kludgy, and quite brittle.I ended up just folding it all into a monorepo.The one feature that I’d love to see in git, is something that Microsoft SourceSafe could do. I call it “Virtual Repos.”You could make a “repo” that was actually an amalgam of files that were references to files in other repos. Their state in the virtual repo reflected their state in their “home” repo, and changes made in the virtual repo would go out to their home repo.It must be a nightmare to get right, though. I can see why it would not be implemented, but these could be used for a lot of the same things that submodules are used for.
- arccy211 days ago
 maybe sparse checkouts would be analogous to virtual repos?
 - ChrisMarshallNY211 days ago
 They are similar. I remember cataloging the differences, once, but that was a long time ago.I don’t think sparse checkouts let you “mix and match,” the way virtual repos did.Think of the files in the virtual repo as “symlinks” to their originals, in other repos.Other than that, SourceSafe was kind of a nightmare, and I don’t miss it.
dboreham212 days ago
imho: do not use submodules. Only need to deal with them when working on someone else's project where they didn't know this rule.
- dheera212 days ago
 other than the command line UX of submodules being crap, i think they are actually a good system.
 - m463212 days ago
 I think that's the nature of git. goes double for lsf.Git reminds me of C - important language, used everywhere, but has never evolved with respect to usability.EDIT: hmm... actually "git switch" was more usable to me
 - Xelynega212 days ago
 What is bad about them on the command line? If you understand git the only extra things you need are `git submodule update --init` and `git submodule add ...`
 - GauntletWizard212 days ago
 The UX that sucks is around what happens if they are unclean/how to update them. A checkout doesn't recursively checkout the relevant submodules. This is the biggest pain point for most orgs I've worked at. It's an easy setting to set (`git config --global submodule.recurse true`) but the fact that it's not default hurts.Most engineers have a poor understanding of Git. My university had a great history of version control course right at the dawn of the git era (In 2006! RIT really speedran it, standardizing on Git by the end of the year, but also including tutorials on RCS, CVS, and SVN and a brief foray into Perforce). Still, a ton of my classmates just didn't get it.The other major blocker is what to do with an unclean submodule repo; I honestly don't remember what git does by default, because it's bad. And most projects get unclean real quick. Makefile hygeine is not common, and for most of time most projects became unclean from a simple `make`. It's better now, but not great.
 - Xelynega211 days ago
 > Makefile hygeine is not common, and for most of time most projects became unclean from a simple `make`.If you're checking generated files into git, submodules aren't the problem imo.I can see the frustration of modifying submodules files and trying to commit the main repo, but if you have to do that then it wasn't supposed to be a submodule. That's like complaining that modifying node_modules files doesn't apply upstream to your dependencies.
 - glandium211 days ago
 There are bad interactions with rebase too.
 usr1106211 days ago
 I have gotten rid of those by the rule: If a commit update updates a submodule, it must not update anything else. (Yes, this can violate the general rule that nothing needs to be added to a commit to be complete. But updating submodules has been worth the exception in my experience.)
 PokestarFan211 days ago
 I'm working on a project where pushing a commit to a submodule runs a CI job which updates the reference on the parent repository. This seems to lead to very few issues.
 - Izkata212 days ago
 Add in "--remote --recursive" to the "update" command and you have the same thing as unpinned svn externals.
 - dheera210 days ago
 I can't remember whether it's<pre><code> git update submodule --init git init --submodules --update git init --recursive --submodule --update </code></pre> etc.Git checkout should do that automatically. Checkout a repo, all the submodules automatically checkout. There is no good reason not to.
hakanderyal211 days ago
Checkout git subrepo [0] if you also find working with submodules cumbersome.It has a different set of trade offs and works without any problems or changes to your workflow if they fit. (Only thing it has problems is rebasing, under specific circumstances)[0] <a href="https://github.com/ingydotnet/git-subrepo">https://github.com/ingydotnet/git-subrepo</a>
- IshKebab211 days ago
 It would be nice if they could say how it differs to git subtree. Any idea?
 - hakanderyal211 days ago
 Just checked subtree and while they aim to provide the same thing, they are using different ways.- You can mix subrepo commits and main repo commits freely in a single commit, it’ll take care of submitting only the relevant changes when pushing upstream.- Publishing changes from a subrepo iş just a single command.- Subrepo adds a .gitrepo file to the subdirectory for metadataThe readme on repo does a good job of explaining things.
 - IshKebab211 days ago
 I haven't ever actually used git subtree to push changes, but I'm pretty sure that all of those are true for it too, and git subtree doesn't need a `.gitrepo` file so that seems like a point in its favour.I'm sure there are advantages to git subrepo, but I am still not sure what they are.
 - hakanderyal210 days ago
 The downsides from the subtree website [0] doesn’t apply to sub repo.For me, I just use git normally as I would, and do a subrepo push when it’s time. Subtree would make me change my workflow which is a big one for me.[0] <a href="https://www.atlassian.com/git/tutorials/git-subtree" rel="nofollow">https://www.atlassian.com/git/tutorials/git-subtree</a>
CSMastermind211 days ago
After running the platform org at a company that used submodules: never again.There was not a week that went by without me having to unstick some team that had horribly managed to screw things up because of them or watch an engineer burn an entire day fighting with them or watch a new hire completely confused."Well if everyone would just ..."Everyone is not going "to just". If your system relies on everyone inherently having the same understanding of the world and behaving in the same way, then it's a terrible system.
cjfd211 days ago
This is the article I should have read when first trying to use git submodules. The two main facts "a submodule is another git repository" and "a submudule is always pinned to a specific commit of the other git repository" are the most important things to understand git submodules. Somehow all tutorials/examples that I saw before show lots of git commands and their outputs but do not highlight the two basic facts so they actually do not help.
- dmazin211 days ago
 Hey, I wrote this article. Just wanted to really thank you for saying that. I also felt like "why didn't anyone tell me this" so I wanted to share it :-)
- blueflow211 days ago
 Type `man 7 gitsubmodules` into your terminal.Edit: typo'ed command
 - cjfd211 days ago
 <pre><code> [cjfd@cjfdpc ~]$ man 7 gitmodules No manual entry for gitmodules in section 7</code></pre>
tonymet211 days ago
I recommend trying subtree. for many cases it does what you need . And the other consumers (developers, build systems) don't need any special tooling.
- eru211 days ago
 Yes, that's my default suggestion as well.Git subtrees also let you move back and forth between multiple repos vs mono-repo without losing history. So you don't need to solve that particular debate in your team.
 - tonymet208 days ago
 Great Tip!
vermilingua212 days ago
Can anyone speak to usecases for submodules that arent better served by your language’s package manager? Multi-language codebases, languages without appropriate package management perhaps?
- nrclark211 days ago
 Submodules are great for projects where your code depends on upstream Git repos that you don't control and don't want to vendor yourself.I recently did an embedded Linux design that depended on 5 external repositories: one from Yocto, three from OpenEmbedded, and one from a CPU vendor. My own code just sat on the top of this set of repos.Submodules made that design very simple. One repo with all of my code in it, and submodules for all external repos. All dependent repos were pinned before of how submodules work. Pins were easy to update when desired, and never move on their own.
 - maleldil211 days ago
 Isn't that because you didn't have a good package manager to handle these? If you have a package manager that allows you to add packages from Git repos, why would you use submodules?
 - nrclark211 days ago
 Is there a specific package manager that you're thinking of? I'm always open for recommendations.
- ajfriend212 days ago
 We use a submodule in <a href="https://github.com/uber/h3-py">https://github.com/uber/h3-py</a> to wrap the core H3 library, which is written in C. Submodules seemed like a reasonable way to handle the dependency, and, at least for this use case, the approach hasn't given me any problems.
- c0balt212 days ago
 The latter had been an issue for me in the past with some projects that just weren't packaged for, e. G., python and had to be imported directly. It can also be helpful for non-packaged assets that are held in a separate git repository.
- bschwindHN212 days ago
 > Multi-language codebasesThis is where I use them. I have some Rust bindings to C++ code, and that C++ code lives in my repo as a submodule. Everyone seems to hate submodules I guess because of the surprising behavior described in the post, but for my use case they've been completely fine.
- foooorsyth211 days ago
 Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable. The default pack file behavior of git is completely unsuitable for a rapidly releasing company with a monorepo. You don’t want or need the entire history — you just want a few recent commits. You do want some visibility into what your coworkers are up to so you can prevent merge conflicts before they happen (centralization is good!). You probably only need part of the tree, not the entire thing.If you go back and watch Linus’ talk at Google regarding git, he’s basically describing (unknowingly) why Google needs to not use git for its day to day. Even on a smaller scale, Android (AOSP) had to create a meta tool for git called git-repo to handle its source tree. Git submodule failed there.
 - eru211 days ago
 > Two of the largest tech companies in the world, Google and Meta, had to roll custom VCS for their day to day engineering operations because git and git submodule were so unsuitable.Where did you get that from? Sources?Google rolled their own VCS, because Google is older than Git, and they needed something that works. Their custom VCS is a hacked up version of Perforce.By the time Git came around, Google was already pretty much committed to their in-house custom tool, too many things relied on it.
 - rcxdude211 days ago
 Git submodules aren't really intended for that use-case in the first place. They're not really intended to model a mono-repo at all, more a relationship between repositories that have their own histories.The main thing that has been developed in git to allow very large repos is shallow clones (both in terms of history and slices of the repo). This model works well enough within git's logic, but it's just historically not been focused on until fairly recently (and I don't really know what the state of play is there - I think there's still a limit at a certain scale where simply finding the state of play of a large checkout becomes a bottleneck, and you start to want a persistent daemon to use FS notifications to keep track of what's changed instead of stat()ing every file in the tree)(I've often pondered if it would be possible to make a DVCS where there's no firm repo boundary at all, i.e. you could construct a checkout from any combination of trees and commits stored in different locations, and have it work seamlessly. There's probably more than a few thorny issues in there, but it would be an interesting concept)
 - jayd16211 days ago
 Don't all your examples predate stable git-submodules?
 - staunton211 days ago
 Simce when would you say are git-submodules stable?
- cxr211 days ago
 Every "language package manager" with a lock file format and requirements file, is an inferior, ad hoc, formally-specified, error-prone, incompatible reimplementation of half of Git.Almost every use case for a package manager is better served by Git, whether you choose to use submodules or not. If you want to do version control, use the version control system, and stop trying to do an end-run around the way it works.Previously:> I'm happy to criticize NPM the tool. The whole thing is designed as a second, crummier version control system that lives in disharmony with and on top of your base-level version control system (so it can subvert it). It's a terrible design.<<a href="https://news.ycombinator.com/item?id=37604551">https://news.ycombinator.com/item?id=37604551</a>>
 - eru211 days ago
 Git knows nothing about eg semantic versioning, or how to resolve different requirements from different libraries that you want to use together.
 - cxr211 days ago
 Right.
 - eru210 days ago
 So it might be a decent replacement for a lock file, only.
 cxr210 days ago
 What?
 eru209 days ago
 Git only replaces the lock file aspect of package managers, not the version requirement resolution part. (Or the part that tells deals with eg Rust's feature selection, or different build instructions for different operation systems or versions of the language etc.)
 cxr209 days ago
 > Git only replaces the lock file aspect of package managersNope, Git is pretty good about downloading stuff over the network, too. In fact, it's so good at it that many people using a language package manager insist you use Git at some point even when (before) using the package managers. Indeed, there's been a lot of trepidation and gnashing of teeth about whether the places where language package managers download packages from are as reliable/trustworthy as the server where the Git repo for the software project is hosted.> nothing about eg semantic versioning, or how to resolve different requirements from different libraries"[…] incompatible reimplementation of _half_ of Git."
- dirkt211 days ago
 I have used them for OpenAPI specs shared between frontend and backend, and database schemas together with values for a test database in the pipeline.Yes, you can also solve this with a monorepo...
 - eru211 days ago
 You can use git subtree to convert between mono-repo and separate repos, without losing your history. You can even keep both styles up to date concurrently.
megamix211 days ago
Git - 100+ flags and commands when you only need 3-5.
- bvrmn211 days ago
 Actually it's backwards. Git gives ability to manage and navigate commit graph on quite low level using a pair of commands: checkout and reset and fulfill any wild desires. While in other VCSs it's a separate command per case.Note: I think GIT UX is horrible and requires multiple years of practice to be comfortable with.
- eru211 days ago
 Most of the time, you only need a handful of commands, but there's a long tail of niche situations, especially if you are using git for maintaining a large project like the Linux kernel.Remember, git was designed and written for the kernel first and foremost.
 - megamix206 days ago
 But it's used widely in projects that are not the Linux kernel. So what should they do, fork Git and create a LITE version? :)
 - megamix206 days ago
 Also, do you have examples of niche situations?
abdullahkhalids211 days ago
I am running simulations with a rapidly evolving codebase. I have a separate repo with all the simulation code in it. I am want to tie each simulation with the git commit (of the main repo) at which it was run. Are git submodules the correct solution to this in any way?
- banditelol211 days ago
  I think you want something aling the line of dvc (github.com/iterative/dvc)
- swinglock211 days ago
  Perhaps. Tags might be fitting instead.
yig211 days ago
For C++, I've started using FetchContent as a kind of distributed package manager. When I think about it, why is that better than git submodules?
0x69420212 days ago
understanding submodules has not caused me to stop wishing that something in the vein of nix (in the sense of being able to provide a "lockfile" that transcends language-level package managers) becomes sufficiently commonplace that people would feel silly doing anything other than using whatever that turns out to be, or just directly vendoring if all else fails
vsskanth212 days ago
speaking of submodules, anyone here have experience with git subrepo ?
- metadat212 days ago
 Subrepos are non-standard and don't offer any significant benefit compared to submodules.Even still, submodules are also best avoided.
the_clarence211 days ago
Does Meta sapling git wrapper facilitates working with submodules?
pictur211 days ago
[dead]