On this page: Series Overview • Key Focus Areas • Spring 2025 Events • Fall 2024 Events • Organizers • Attendance Information

The CDS Seminar Series showcases the latest developments from all areas of data science, featuring local speakers, special interdisciplinary events, and the flagship CDS Colloquium with worldwide leaders in the field. It takes place every Friday from 2:00pm–4:00pm, with talks scheduled from 2:00pm–3:30pm, followed by a reception for attendees from 3:30pm–4:00pm.
Series Overview
Launched to highlight cutting-edge research, the CDS Seminar Series brings together experts from various disciplines within data science. The series aims to be accessible to the entire CDS community, fostering collaboration and sparking new ideas across different areas of the center. By featuring a diverse range of speakers and topics, the series provides a platform for knowledge exchange and interdisciplinary dialogue.
Key Focus Areas
The seminar covers a wide range of topics in data science, including but not limited to:
- Machine Learning
- Artificial Intelligence
- Natural Language Processing
- Computer Vision
- Statistical Modeling
- Data Visualization
- Big Data Analytics
- Neural Networks
- Reinforcement Learning
- Causal Inference
- Computational Biology
- Data Ethics and Fairness
- Deep Learning
- Bayesian Methods
This seminar series serves as a bridge between different parts of CDS, encouraging cross-pollination of ideas and fostering potential collaborations within the center.
Spring 2025 Events
- Date: February, 21, 2025
- Speaker(s): Emily Black (CDS)
- Title: TBA
- Overview: TBA
- Date: February 14, 2025
- Speaker(s): Xi Chen (CDS)
- Title: TBA
- Overview: TBA
- Date: January 24, 2025
- Speaker(s): Eunsol Choi (CDS)
- Title: Equipping LLMs for Complex Knowledge Scenarios: Interaction and Retrieval
- Overview: Language models are increasingly used as an interface to gather information. Yet trusting the answers generated from LMs is risky, as they often contain incorrect or misleading information. Why is this happening? We identify two key issues: (1) ambiguous and underspecified user questions and (2) imperfect knowledge in LMs, especially for long tail or recent events. To address the first issue, we propose a system that can interact with users to clarify their intent before answering. By simulating their expected outcomes in the future turns, we reward LMs for generating clarifying questions and not just answering immediately. In the second part of the talk, I will discuss the state of retrieval augmentation, which is often lauded as the path to provide up-to-date, relevant knowledge to LMs. While their success is evident in scenarios where there exists a single gold document, incorporating information from a diverse set of documents remains challenging for both retrieval systems and LMs. Together, the talk highlights key research directions for building reliable LMs to answer information seeking questions.
Fall 2024 Events
- Date: November 8, 2024
- Speaker(s): Tim O’Donnell (McGill University)
- Title: Syntactic And Semantic Control Of Large Language Models Via Sequential Monte Carlo
- Overview: A wide range of LLM applications require generating text that conforms to syntactic or semantic constraints. Imposing such constraints nontrivially alters the distribution over sequences, usually making exact sampling intractable. In this work, building on the Language Model Probabilistic Programming framework of Lew et al. (2023), we develop an approach to approximate inference for controlled LLM generation based on sequential Monte Carlo (SMC). Our SMC framework allows us to flexibly incorporate domain- and problem-specific constraints at inference time, and efficiently reallocate computation in light of new information during the course of generation. We demonstrate that our approach improves downstream performance on four challenging domains—Python code generation for data science, text-to-SQL, goal inference, and molecule synthesis. We compare to a number of alternative and ablated approaches, showing that our accuracy improvements are driven by better approximation to the full Bayesian posterior.
- Date: November 1, 2024
- Speaker(s): Grace Lindsay and David Bau (of Northeastern University)
- Title: Finding Facts and Functions, and a Fabric (joint seminar with Minds, Brains, and Machines)
- Overview: In this talk we discuss recent work in interpreting and understanding the explicit structure of learned computations within large deep network models. We examine the localization of factual knowledge within transformer LMs, and discuss how these insights can be used to edit behavior of LLMs and multimodal diffusion models. Then we discuss recent findings on the structure of computations underlying in-context learning, and how these lead to insights about the representation and composition of functions within LLMs. Finally, time permitting, we discuss the technical challenges of doing interpretability research in a world where the most powerful models are only available via API, and we describe a National Deep Inference Fabric that will offer a transparent API standard that enables transparent scientific research on large-scale AI.
- Date: October 25, 2024
- Speaker(s): Eric Oermann, Krzysztof Geras, Narges Razavian
- Title: AI + Medicine
- Overview: A series of brief talks will be given by our Langone affiliates Dr. Eric Oermann, Dr. Krzysztof Geras, and Dr. Narges Razavian, followed by a lively panel discussion and Q&A moderated by Sumit Chopra.
- Date: October 4, 2024
- Speaker(s): Cédric Gerbelot-Barrillon and Jonathan Colner
- Title: High-dimensional optimization for multi-spiked tensor PCA (Cédric Gerbelot-Barrillon) and Leveraging foundation models to extract local administrative data (Jonathan Colner)
- Date: September 27, 2024
- Speaker(s): Byung-Doh Oh, Aahlad Puli, and Yuzhou Gu
- Titles: What can linguistic data tell us about the predictions of (large) language models? (Byung-Doh Oh), Explanations that reveal all through the definition of encoding (Aahlad Puli), Community detection in the hypergraph stochastic block model (Yuzhou Gu)
- Date: September 20, 2024
- Speaker(s): Nick Seaver, Kyunghyun Cho, Grace Lindsay (moderator: Leif Weatherby)
- Title: Oral History of Machine Learning – “Why We Call It “Attention”
- Overview: Attention has rapidly become an essential technique in AI. This conversation between Nick Seaver (Tufts, Anthropology), Kyunghyun Cho (NYU Center for Data Science), and Grace Lindsay (NYU Center for Data Science), looks back to a moment before attention became “all you need,” when attention mechanisms were imagined and introduced.
Organizers
The CDS Seminar Series is organized by Associate Professor of Mathematics and Data Science Jonathan Niles-Weed, Associate Professor of Mathematics and Data Science Carlos Fernandez-Granda, Associate Professor of Linguistics and Data Science Tal Linzen, and Assistant Professor of Mathematics and Data Science Yanjun Han.
Attendance Information
The CDS Seminar Series takes place every Friday from 2:00pm–4:00pm. Talks are scheduled from 2:00pm–3:30pm, followed by a reception for attendees from 3:30pm–4:00pm. All members of the CDS community are encouraged to attend these sessions featuring cutting-edge research and insights from leading experts in the field of data science.