at the Faculty of Mathematics

of the University of Vienna.

Faculty of Mathematics,

University of Vienna,

Oskar Morgenstern Platz 1,

1090 Wien,

OMP 07.132

CV (last updated on 10 Juli 2019)

- Approximation theory and structural properties of neural networks
- Application of deep learning in numerical analysis
- Applied harmonic analysis, in particuar, multiscale systems (wavelets, shearlets, and generalisations)

I uploaded two preprints: First, with Carlo Marcati, Joost A. A. Opschoor, Christoph Schwab and I studied in "Exponential ReLU Neural Network Approximation Rates for Point and Edge Singularities " the extent to which one can emulate higher dimensional hp-FEM using deep neural networks. Using a reapproximation of hp-FEM, we can show that for a wide range of functions on bounded, but quite general domains, we achieve exponentially fast approximation via neural networks. In a second preprint with Andrei Caragea and Felix Voigtlaender called "Neural network approximation and estimation of classifiers with classification boundary in a Barron class", we study the approximation and estimation of high-dimensional functions that have structured singularities. In essence, we show that if the singularities are locally of Barron-type, then one can approximate and estimate, with rates independent of the underlying dimension.

Simon, Surbhi, Chao, Song, Matthew, Stephan and I created an online seminar series on the Mathematics of Machine Learning.

I moved to the university of Vienna for an assistant professorship in machine learning.

I recently finished a preprint with Gitta Kutyniok, Mones Raslan, and Reinhold Schneider on the approximation of parametric maps by deep neural networks. We demonstrate that, under some technical conditions, the size of approximating neural networks to approximate a discretised parametric map does not, or only weakly, depend on the size of the discretisation. Instead, the size of these networks is determined by the size of a reduced basis. In this regard, our results constitutes an approximation result where the curse of dimension is overcome and shows that deep learning techniques can efficiently implement model order reduction techniques.

Together with the participants of the Oberwolfach Seminar: Mathematics of Deep Learning, I wrote a (not entirely serious) paper called "The Oracle of DLPhi" proving that Deep Learning techniques can perform accurate classifications on test data that is entirely uncorrelated to the training data. This, however, requires a couple of non-standard assumptions such as uncountably many data points and the axiom of choice. In a sense this shows that mathematical results on machine learning need to be approached with a bit of scepticism.

Felix and I submitted our preprint: Equivalence of approximation by convolutional neural networks and fully-connected networks to the arXiv. In this note, we establish approximation theoretical results, i.e., lower and upper bounds on approximation fidelity compared to the number of parameters, for convolutional neural networks. In practice, convolutional neural networks are used to a much greater extent than standard neural networks, while, traditionally, mathematical analysis mostly dealt with standard neural networks. We now show that all classical approximation results of standard neural networks imply very similar approximation results for convolutional neural networks.

I will give a mini-lecture on applied harmonic analysis at the PDE-CDT Summer School 2018 at Ripon college. The lecture notes can be found here.

Felix, Mones, and I just uploaded a preprint on topological properties of sets of functions that are representable by neural networks of fixed size.

In this work we analyse simple set topological properties, such as, *convexity, closedness or density* of the set of networks with a fixed architecture. Quite surprisingly we found that the topology of this set is **not** particularly convenient for optimisation.
Indeed, for all commonly-used activation functions, the sets of networks of fixed size are non-convex (not even weakly), nowhere dense, cannot be stably parametrised, and are not closed with respect to L^p norms. For almost all commonly-used activation functions except for the parametric ReLU,
the non-closedness extends to the uniform norm.
In fact, for the parametric ReLU the associated spaces are closed with respect to the supremum norm if the architecture has only one hidden layer.

When training a network, these properties can lead to *many local minima of the minimization problem, exploding coefficients*, and *very slow convergence*.

I created this webpage after I moved from TU Berlin to U Oxford.