A while ago a lot of the discussion about overparametrization was about explaining "double descent", the observation that test error doesn't descend monotonically and actually hits a local maximum around the point where the model has just enough parameters to interpolate the data. My favorite article about double descent looks at this in terms of splines [1]. If I can try to summarize that article: when you are designing a parametrized model to fit to data, you have a choice. You can either:<p>1. Avoid overparametrization by design. Manually create or choose a space of functions that has limited degrees of freedom by construction.<p>2. Accept overparametrization and regularize.<p>The latter tends to be more robust, because of the bitter lesson. It's not practical to manually design an ideal, on-demand, just-right limited-parameter model for every dataset we are presented with. The best way to approach that ideal, it turns out, is really to just let the computer figure it out via high-dimensional regularized search over an overparametrized space.<p>Statisticians started moving in favor of overparametrization long before deep learning got off the ground. This trend dates back at least to the machine learning bible, Elements of Statistical Learning (2001).<p>[1] <a href="https://mlu-explain.github.io/double-descent2/" rel="nofollow">https://mlu-explain.github.io/double-descent2/</a>