**On this page:** About • Seminar Series • Fall 2024 Seminars • People • Sponsors

## About

**The Math and Data (MaD) group at CDS, in collaboration with the Courant Institute of Mathematical Sciences, focuses on building the mathematical and statistical foundations of data science.** With expertise spanning signal processing, machine learning, deep learning, and high-dimensional statistics, the group tackles some of the field’s most critical challenges, from understanding neural networks to improving climate modeling.

Launched in 2017 by CDS Professor Joan Bruna, CDS Associate Professor Carlos Fernandez-Granda, and former colleague Afonso Bandeira, the group is known for its influential MaD Seminar, which serves as a hub for in-depth discussion on the theoretical foundations of machine learning and data science. The group’s research aims to make AI systems more interpretable and reliable by uncovering the mathematical principles underlying complex algorithms.

Whether developing mathematical frameworks to expedite climate simulations or exploring the optimal transport problem’s modern applications, MaD researchers balance theoretical rigor with practical impact. Their work, driven by a deep commitment to both pure math and real-world applications, positions them at the forefront of data science research.

## Seminar Series

### MaD and MIC

**The Math and Data (MaD) Seminar Series at CDS serves as a forum to explore the mathematical foundations of data science, bringing together researchers from various disciplines to discuss topics ranging from classical statistics to modern machine learning.** Founded in 2016 by CDS Associate Professor Carlos Fernandez-Granda, Professor Joan Bruna, and former NYU Assistant Professor Afonso Bandeira (now at ETH Zurich), the seminar reflects the diverse research interests of the MaD group’s faculty and has grown to become one of the longest-running and most impactful series at CDS.

The MaD Seminar has become a cornerstone of the CDS community, fostering a space where faculty, postdocs, and students can engage with groundbreaking research and cultivate new ideas. Regular speakers include both seasoned experts and promising new voices, making the series a launching pad for rising stars in the field. By bringing together these diverse perspectives, the MaD Seminar is not just a venue for presenting research but a catalyst for collaboration and innovation in data science.

**The Mathematics, Information and Computation (MIC) Seminar runs at irregular intervals and covers specific aspects at the interface of applied maths, information theory and theory of computation.**

#### Fall 2024 Seminars

##### MaD Seminar with Nikita Zhivotovskiy (UC Berkeley): Mean and covariance estimation of anisotropic distributions in the presence of adversarial outliers

**October 10, 2:00pm EST, Auditorium Hall 150, Center for Data Science, NYU, 60 5th Ave**

**Abstract: **Suppose we are observing a sample of independent random vectors with unknown general covariance structure, knowing that the original distribution was contaminated, so that a fraction of observations came from a different distribution. How to estimate the mean and the covariance matrix of the original distribution in this case? In this talk, we discuss some recent estimators that achieve the optimal non-asymptotic, dimension-free rate of convergence under the model where the adversary can corrupt a fraction of the samples arbitrarily. The discussion will cover a wide range of distributions including heavy-tailed, sub-Gaussian, and specifically Gaussian distributions.

**Bio:** Nikita Zhivotovskiy is an Assistant Professor in the Department of Statistics at the University of California Berkeley. He previously held postdoctoral positions at ETH Zürich in the department of mathematics hosted by Afonso Bandeira, and at Google Research, Zürich hosted by Olivier Bousquet. He also spent time at the Technion I.I.T. mathematics department hosted by Shahar Mendelson. Nikita completed his thesis at Moscow Institute of Physics and Technology under the guidance of Vladimir Spokoiny and Konstantin Vorontsov.

##### MIC Seminar with Anya Katsevich (MIT): Laplace asymptotics in high-dimensional Bayesian inference

**September 30, 12:00pm EST, Room 650, Center for Data Science, NYU, 60 5th Ave**

**Abstract:** Computing integrals against a high-dimensional posterior distribution is the major computational bottleneck in Bayesian inference. A popular technique to make this computation cheaper is to use the Laplace approximation (LA), a Gaussian distribution, in place of the true posterior. Yet the accuracy of this approximation is not fully understood in high dimensions. We derive a new, leading order asymptotic decomposition of the LA error in high dimensions. This leads to lower bounds which resolve the question of the dimension dependence of the LA. It also leads to a simple modification to the LA which yields a higher-order accurate posterior approximation. Finally, we derive the high-dimensional analogue of the classical asymptotic expansion of Laplace-type integrals. This opens the door to approximating the partition function (aka the posterior normalizing constant), of use in high-dimensional model selection and many other applications beyond statistics.

##### MaD Seminar with Cyril Letrouit (Orsay): Stability of optimal transport: old and new

**September 19, 2:00pm EST, Auditorium Hall 150, Center for Data Science, NYU, 60 5th Ave**

**Abstract:** Optimal transport consists in sending a given source probability measure to a given target probability measure, in a way which is optimal with respect to some cost. On bounded subsets of R^d, if the cost is given by the squared Euclidean distance and the source measure is absolutely continuous, a unique optimal transport map exists.

The question we will discuss is the following: how does this optimal transport map change if we perturb the target measure? For instance, if instead of the target measure we only have access to samples of it, how much does the optimal transport map change? This question, motivated by numerical aspects of optimal transport, has started to receive partial answers only recently, under quite restrictive assumptions on the source measure. We will review these answers and show how to handle much more general cases.

This is a joint work with Quentin Mérigot.

##### MaD Seminar with Joel A. Tropp (Caltech): Randomly pivoted Cholesky

**September 12, 2:00pm EST, Auditorium Hall 150, Center for Data Science, NYU, 60 5th Ave**

**Abstract:** André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky’s method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called “pivots”. A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

This talk describes a simple but powerful randomized procedure for adaptively picking the pivot columns. This algorithm, randomly pivoted Cholesky (RPC), provably achieves near-optimal approximation guarantees. Moreover, in experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation.

Cholesky died in 1918 from wounds suffered in battle. In 1924, Cholesky’s colleague, Commandant Benoit, published his manuscript. One century later, a modern adaptation of Cholesky’s method still yields state-of-the-art performance for problems in scientific machine learning.

Joint work (*Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations*) with Yifan Chen, Ethan Epperly, and Rob Webber.

**Speaker:** Anya Katsevich

**Title: **Laplace asymptotics in high-dimensional Bayesian inference

**Date:** September 30, 2024

**Abstract:** “Computing integrals against a high-dimensional posterior distribution is the major computational bottleneck in Bayesian inference. A popular technique to make this computation cheaper is to use the Laplace approximation (LA), a Gaussian distribution, in place of the true posterior. Yet the accuracy of this approximation is not fully understood in high dimensions. We derive a new, leading order asymptotic decomposition of the LA error in high dimensions. This leads to lower bounds which resolve the question of the dimension dependence of the LA. It also leads to a simple modification to the LA which yields a higher-order accurate posterior approximation. Finally, we derive the high-dimensional analogue of the classical asymptotic expansion of Laplace-type integrals. This opens the door to approximating the partition function (aka the posterior normalizing constant), of use in high-dimensional model selection and many other applications beyond statistics.” –

Anya Katsevich

**Speaker:** Cyril Letrouit (Orsay)

**Title: **Stability of optimal transport: old and new

**Date:** September 19, 2024

**Abstract:** “Optimal transport consists in sending a given source probability measure to a given target probability measure, in a way which is optimal with respect to some cost. On bounded subsets of R^d, if the cost is given by the squared Euclidean distance and the source measure is absolutely continuous, a unique optimal transport map exists.

The question we will discuss is the following: how does this optimal transport map change if we perturb the target measure? For instance, if instead of the target measure we only have access to samples of it, how much does the optimal transport map change? This question, motivated by numerical aspects of optimal transport, has started to receive partial answers only recently, under quite restrictive assumptions on the source measure. We will review these answers and show how to handle much more general cases.

This is a joint work with Quentin Mérigot.” – Cyril Letrouit

**Speaker:** Joel A. Tropp (Caltech)

**Title:** Randomly Pivoted Cholesky

**Date:** September 12, 2024

**Abstract:** André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky’s method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called “pivots”. A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

This talk describes a simple but powerful randomized procedure for adaptively picking the pivot columns. This algorithm, randomly pivoted Cholesky (RPC), provably achieves near-optimal approximation guarantees. Moreover, in experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation.

Cholesky died in 1918 from wounds suffered in battle. In 1924, Cholesky’s colleague, Commandant Benoit, published his manuscript. One century later, a modern adaptation of Cholesky’s method still yields state-of-the-art performance for problems in scientific machine learning.

Joint work (“Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations“) with Yifan Chen, Ethan Epperly, and Rob Webber.

André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky’s method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called “pivots”. A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

This talk describes a simple but powerful randomized procedure for adaptively picking the pivot columns. This algorithm, randomly pivoted Cholesky (RPC), provably achieves near-optimal approximation guarantees. Moreover, in experiments, RPC matches or improves on the performance of alternative algorithms for low-rank psd approximation.

Cholesky died in 1918 from wounds suffered in battle. In 1924, Cholesky’s colleague, Commandant Benoit, published his manuscript. One century later, a modern adaptation of Cholesky’s method still yields state-of-the-art performance for problems in scientific machine learning.

Joint work (“Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations“) with Yifan Chen, Ethan Epperly, and Rob Webber.

“Optimal transport consists in sending a given source probability measure to a given target probability measure, in a way which is optimal with respect to some cost. On bounded subsets of R^d, if the cost is given by the squared Euclidean distance and the source measure is absolutely continuous, a unique optimal transport map exists.

The question we will discuss is the following: how does this optimal transport map change if we perturb the target measure? For instance, if instead of the target measure we only have access to samples of it, how much does the optimal transport map change? This question, motivated by numerical aspects of optimal transport, has started to receive partial answers only recently, under quite restrictive assumptions on the source measure. We will review these answers and show how to handle much more general cases.

This is a joint work with Quentin Mérigot.” – Cyril Letrouit

## September 12

#### Joel A. Tropp: Randomly Pivoted Cholesky

André-Louis Cholesky entered École Polytechnique as a student in 1895. Before 1910, during his work as a surveyer for the French army, Cholesky invented a technique for solving positive-definite systems of linear equations. Cholesky’s method can also be used to approximate a positive-semidefinite (psd) matrix using a small number of columns, called “pivots”. A longstanding question is how to choose the pivot columns to achieve the best possible approximation.

Joint work (“Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations“) with Yifan Chen, Ethan Epperly, and Rob Webber.

## September 19

#### Cyril Letrouit: Stability of optimal transport: old and new

Optimal transport consists in sending a given source probability measure to a given target probability measure, in a way which is optimal with respect to some cost. On bounded subsets of R^d, if the cost is given by the squared Euclidean distance and the source measure is absolutely continuous, a unique optimal transport map exists.

This is a joint work with Quentin Mérigot.

## People

### Core Faculty

## MaD Core Faculty

### Affiliated Faculty

## MaD Affiliated Faculty

### PhD / MSc Students

## MaD PhD

### Postdocs and Fellows

## MaD Postdocs and Fellows

## Sponsors

**We are thankful for the generous financial support of the following institutions:**

- National Science Foundation
- National Institutes of Health
- Alfred P. Sloan Foundation
- ARO
- Vesri Schmidt Futures
- Samsung Research
- Samsung Electronics
- Capital One