Modul: MAT870 Zurich Colloquium in Applied and Computational Mathematics

## Some Recent Progresses in the Mathematics of Neural Training

Talk by Dr. Anirbit Mukherjee

**Date:** 31.03.21 **Time:** 16.15 - 17.45 **Room:** Online ZHACM

One of the most intriguing mathematical mysteries of our times is to be able to explain the phenomenon of deep-learning. Neural nets can be made to paint while imitating classical art styles or play chess better than any machine or human ever and they seem to be the closest we have ever come to achieving "artificial intelligence". But trying to reason about these successes quickly lands us into a plethora of extremely challenging mathematical questions - typically about discrete stochastic processes. Some of these questions remain unsolved for even the smallest neural nets! In this talk we will describe two of the most recent themes of our work in this direction. Firstly, we will explain how under mild distributional conditions we can construct iterative algorithms which can train a ReLU gate in the realizable setting in linear time while also keeping track of mini-batching. We will show how this algorithm does approximate training when there is a data-poisoning attack on the training labels. Such convergence proofs remain unknown for S.G.D, but we will show via experiments that our algorithm very closely mimics the behaviour of S.G.D. Lastly, we will review this very new concept of "local elasticity" of a learning process and demonstrate how it appears to reveal certain universal phase changes during neural training. Then we will introduce a mathematical model which reproduces some of these key properties in a semi-analytic way. We will end by delineating various open questions in this theme of macroscopic phenomenology with neural nets. This is joint work with Weijie Su (Wharton, Statistics), Sayar Karmakar (U Florida, Statistics) and Phani Deep (Amazon, India)