Decoupled DiLoCo: Resilient, Distributed AI Training at Scale

(deepmind.google)

15 points by metadat1 hour ago

1 comments

SilverElfin57 minutes ago
Is this actually innovative? I respect that there’s a lot of work in making it reality and doing it specifically for AI training by modifying their algorithms. But doing portions of work in clusters that are far apart and combining them has been done many times before for non AI things, right? Or so I would think.
- Centigonal3 minutes ago
  The MapReduce pattern is very old, but AI training tend to be difficult to parallelize, doubly so across nodes with hundreds of milliseconds or seconds of latency. This limitation necessitates expensive and dense supercomputing datacenters. The OP seems to address that problem directly.<p>In other words, this isn't a paper about how we should parallelize AI training, everyone already knows that. It's a paper about removing a constraint that makes parallelizing AI training inefficient.
- philipkglass44 minutes ago
  Generically speaking, yes, this has been done before. But it can take a lot of work to transform software that works with shared memory or other low-latency interprocess communication mechanisms so that it's practical to run across wide area networks. Sometimes that's not possible at all, which is why certain problems still require "high performance computing" architectures with all of their compute nodes in the same building, connected by high-bandwidth, low-latency links.