Math& Data Seminar: Optimal Approximation with Sparse Deep Neural Networks

A group of professors and researchers at the Technical University of Berlin, the University of Vienna, and ETH Zurich have recently been working on understanding deep neural networks (computer systems that are modelled after the human brain) in “a mathematically sound way”, as Dr. Phillip Petersen refers to it.

Although the official paper for this exciting research, “Optimal Approximation with Sparse Deep Neural Networks,” will not be published until next week, Professor Gitta Kutyniok graciously presented a preview of their work for CDS’ Math & Data Seminar group this past Thursday.

Neural networks, or artificial brains, represent functions in mathematics. For these researchers, the main goal is to uncover how well a deep neural network with sparse connectivity can approximate a function.

Dr. Petersen likens the network to a tree. Deep neural networks are composed of multiple layers and are connected by edges. Those layers are made of nodes, or neurons where computation occurs, and are sparsely connected if they have few non-zero weights or edges. Thus, the connections between different neurons and the number of connections that there are is the focus at hand.

Some real world applications of neural networks include Siri, Apple Inc.’s computer program, ImageNet, an image database and AlphaGo, an artificial intelligence program developed to play the board game Go.

Usually, deep neural networks are trained on particular data like sound, images, text or video. The research in this paper, however, is concerned with the fundamental math. Rather than picking a single network and training it on a function, they established a theorem that is true for all networks. There is no data in the background, only a fundamental theorem on networks.

Thus far, their research has presented two new results to the field. First, they have provided new optimality criteria not tied to a specific function class or a universal lower bound. Basically, a connection has been established between the size of the network and the approximation quality. Secondly, construction of a neural network that’s optimal for a specific function class is given for one specific construction.

by Nayla Al-Mamlouk

NYU Center for Data Science