Implicit regularization for artificial neural networks

Heiss, Jakob Michael

doi:10.34726/hss.2019.69320

Record link:

https://doi.org/10.34726/hss.2019.69320
http://hdl.handle.net/20.500.12708/4493

Title:

Implicit regularization for artificial neural networks

Citation:

Heiss, J. M. (2019). Implicit regularization for artificial neural networks [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2019.69320

reposiTUm DOI:

10.34726/hss.2019.69320

CatalogPlus:

AC15493687

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Heiss, Jakob Michael

Advisor:

Teichmann, Josef

Organisational Unit:

E105 - Institut für Stochastik und Wirtschaftsmathematik

Date (published):

2019

Number of Pages:

Keywords:

implizite Regularisierung; maschinelles Lernen; neuronale Netzwekre; early stopping; Spline; Regression; Gradienten-Verfahren; Backpropagation; künstliche Intelligenz.

implicit regularization; machine learning; neural networks; early stopping; spline; regression; gradient descend; back-propagation; artificial intelligence

Abstract:

The main result is a rigorous proof that artificial neural networks without explicit regularization implicitly regularize the integral of the squared second derivative. when trained by gradient descent by solving very precisely the smoothing spline regression problem := arg min C2(Ni=1((xi train)yi traini)2+(′′)2dx) under certain conditions. Artificial neural networks are often used in Machine Learning to estimate an unknown function True by only observing finitely many data points. There are many methods that guarantee the convergence of the estimated function to the true function True as the number of samples tends to infinity. But in practice there is almost always only a finite number N of samples available. Given a finite number of data points there are infinitely many functions that fit perfectly through the N data points but generalize arbitrary bad. Therefore one needs some regularization to find a suitable function. With the help of the main theorem one can solve the paradox why training neural networks without explicit regularization works surprisingly well under certain conditions (in the case of 1-dimensional wide ReLU randomized shallow neural networks).

Additional information:

Decomposed Zeichen konvertiert!

License:

In Copyright

Appears in Collections:

Thesis