NLP Seminar and Text-as-Data Speaker Series

On this page: Series Overview • Spring 2025 Events • Fall 2024 Events • Mailing List • Spring 2024 Seminar Info & Recordings • Archive of Past Events

The NYU NLP and Text-as-Data Speaker Series takes place on Thursdays at 60 5th Avenue, at 11:00am EST; specific room locations for each event can be found below. This interdisciplinary series brings together experts from a wide range of fields, including computer science, linguistics, and social sciences, to explore cutting-edge research in Natural Language Processing (NLP) and text-as-data analysis.

Series Overview

Originally focused on text data applied to social science applications, the series has expanded to incorporate the growing interest in NLP from various disciplines. It provides an opportunity for attendees to engage with groundbreaking work that pushes the boundaries of what’s possible in language processing and analysis. The seminar is organized by Sam Bowman, He He, João Sedoc, Eunsol Choi, and Tal Linzen, and is open to anyone who is affiliated with NYU and wishes to attend.

See below for our schedule of speakers and sign up for our mailing list for details about live-streaming of talks and updates to the schedule.

All talks are currently scheduled to be in-person. The schedule listed below is tentative. Live talk attendance is limited, but recordings will be posted on this page and may be accessed by anyone.

Spring 2025 Events

Upcoming

Aviral Kumar (CMU)

Date & Time: February 27, 11:00am

Title: TBA

Location: Room 150, 60 5th Avenue

Jonathan Berant (Tel-Aviv University)

Date & Time: April 24, 11:00am

Title: TBA

Location: 7th Floor Open Space, 60 5th Avenue

Past

Daphne Ippolito (CMU) Seminar Recording (captions coming soon)

Date & Time: February 20, 11:00am

Title: Troubles with Training Data for Large Language Models

Fall 2024 Past Events

Michael Hahn (Saarland University) Seminar Recording

Date and Time: December 5, 4:00pm

Title: Understanding Language Models via Theory and Interpretability

Jennifer Hu (Johns Hopkins / Harvard)

Date and Time: November 21, 4:00pm

Title: How to Know What Language Models Know

Swabha Swayamdipta (USC) Seminar Recording

Date and Time: November 7, 4:00pm

Title: Ensuring Safety and Accountability in LLMs, Pre- and Post Training (Slides)

Noah Goodman (Stanford) Seminar Recording

Date and Time: October 17, 2024, 4:00pm

Title: Learning To Reason In Language Models

Tim O’Donnell (McGill University)

Date and Time: October 1, 2024, 4:00pm

Title: Linguistic Compositionality and Incremental Processing

Never miss a seminar by joining our mailing list!

Subscribe to NLP/Text-as-Data Seminar Mailing List

Spring 2024 Seminar Info & Recordings

Paradoxes in Transformer Language Models: Positional Encodings

Siva Reddy, McGill University / MILA

How large language models can contribute to cognitive science

Roger Levy, Massachusetts Institute of Technology

Practical AI Systems: From General-Purpose AI to (the Right) Specific Use Cases

Sherry Wu, Carnegie Mellon University

LLMs under Microscope: Illuminating the Bling Spots and Improving the Reliability of Language Models

Yulia Tsvetkov, Paul G. Allen School of Computer Science & Engineering

Archive of Past Events

Fall 2023

Date: September 7

Speaker: Lucie Flek (University of Bonn/University of Marburg)
Title: Human-centered NLP in the LLM Era: Robustness and efficiency through socially and personally contextualised understanding

Slides: Lucie Flek Slides

Date: October 19

Speaker: Jacob Steinhardt (UC Berkeley)
Title: Using Large Language Models to Understand Large Language Models
Recording: Jacob Steinhardt (UC Berkeley) Lecture Recording

Date: November 9

Speaker: Yoon Kim (MIT)
Slides: Yoon Kim TaD Lecture Slides
Title: Large Language Models & Symbolic Structures

Date: November 16

Speaker: Eunsol Choi (UT Austin)
Title: Knowledge Augmentation for Language Models
Recording: Eunsol Choi Lecture Recording
Slides: Eunsol Choi Lecture Slides

Date: November 30

Speaker: Daniel Khashabi (Johns Hopkins University)
Title: Large Language Models: Revisiting Few Mysteries
Recording: Daniel Khashabi’s Lecture Recording
Slides: Daniel Khashabi’s Lecture Slides

Spring 2023

Date: March 30
Speaker: Sasha (Alexander) Rush (Cornell)
Title: “Pretraining without Attention”
Date: April 6
Speaker: Ellie Pavlick (Brown University)
Title: “Mechanisms for Compositionality in Neural Networks”
Recording: Ellie Pavlick Lecture Recording

Date: April 13

Speaker: Mohit Iyyer (UMass)
Title: On using large language models to translate literary works, and detecting when they’ve been used
Recording: Mohit Iyyer Lecture Recording

Date: April 20

Speaker: Asli Celikyilmaz (Meta)
Title: Exploring Machine Thinking
Slides: Asli Celikyilmaz Lectures Slides

Date: April 27

Speaker: Jacob Andreas (MIT)
Title: Language Models as World Models
Recording: Jacob Andreas Lecture Recording
Slides: Jacob Andreas Lecture Slides

Fall 2022

Date: 8 September

Date: 22 September

Date: 13 October

Alan Ritter (Georgia Tech)
Recording: Alan Ritter Lecture Recording
Slides: Alan Ritter Lecture Slides

Date: 20 October

Speaker: Chris Manning (Stanford)
Recording: Chris Manning Lecture Recording
Slides: Chris Manning Lecture Slides

Date: 17 November

Speaker: Emma Strubell (CMU)
Recording: Emma Strubell Lecture Recording
Slides: Emma Strubell Lecture Slides

Spring 2022

Date: 10 March
Speaker: Chenhao Tan (UChicago)
Title: Towards Human-Centered Explanations of AI Predictions
Recording: Chenhao Tan Recording
Slides: Chenhao Tan Slides

Date: 24 March

Speaker: Sameer Singh (UC Irvine)
Title: Lipstick on a Pig: Using Language Models as Few-Shot Learners
Recording: Sameeer Singh Recording
Slides: Sameer Singh Slides

Date: 31 March

Speaker: Sebastian Schuster (NYU)
Title: How contextual are contextual language models?
Recording: Sebastian Schuster Recording
Slides: Sebastian Schuster Slides

Date: 14 April

Speaker: Greg Durrett (UT Austin)
Title: Why natural language is the right vehicle for complex reasoning?
Recording: Greg Durrett Recording
Slides: Greg Durrett Slides

Date: 21 April

Speaker: Douwe Kiela (HuggingFace)
Title: Progress in Dynamic Adversarial Data Collection & Adventures in Multimodal Machine Learning
Recording: Recording
Slides: Douwe Kiela Slides

Fall 2021

23 Sept: Ankur Parikh (Google) – Towards High Precision Text Generation
21 Oct: Alex Warstadt (NYU) – Testing the Learnability of Grammar for Humans and Machines: Investigations with Artificial Neural Networks. [Alex Warstadt Slides]
4 Nov: Marianna Apidianaki (UPenn) – Lexical Polysemy and Intensity in Contextualized Representations [Marianna Apidianaki Slides]
18 Nov: Danqi Chen (Princeton) – Contrastive Representation Learning in Text – [Danqi Chen Slides]

Spring 2021

4 Feb: Byron Wallace (Northeastern) — What does the evidence say? Language technologies to help make sense of biomedical texts [Byron Wallace Lecture Video Recording]
4 March: Nanyun Peng (UCLA) — Controllable Text Generation Beyond Auto-regressive Models [Nanyun Peng Lecture Video Recording]
18 March: Karl Stratos (Rutgers) — Maximal Mutual Information Predictive Coding for Natural Language Processing [Karl Stratos Video Lecture Recording]
1 April: Su Lin Blodgett (Microsoft) — Language and Justice: Reconsidering Harms in NLP Systems and Practices [NYU community only: Sun Lin Blodgett Lecture Video Recording]
15 April: Allyson Ettinger (University of Chicago) — “Understanding” and prediction: Controlled examinations of meaning sensitivity in pre-trained models [Allyson Ettinger Lecture Video Recording]
29 April: Wei Xu (Georgia Tech) — Importance of Data and Linguistics in Neural Language Generation [Wei Xu Lecture Video Recording]

Fall 2020

17 Sept: Anna Rogers (Copenhagen) — When BERT plays the lottery, all tickets are winning
24 Sept: Matt Gardner (AI2) — Contrastive pairs are better than independent samples, for both learning and evaluation [Matt Gardner Lecture Video Recording]
8 Oct: Yonatan Belinkov (Technion) — Causal Mediation Analysis for Interpreting NLP Models: The Case of Gender Bias [Yonatan Belinkov Lecture Video Recording]
22 Oct: Tatsunori Hashimoto (Stanford) — Robustness based approaches for improving natural language generation and understanding [Tatsunori Hashimoto Video Lecture Recording]
29 Oct: Dan Weld (University of Washington) — Semantic Scholar – Advanced NLP to Accelerate Scientific Research [Dan Weld Lecture Video Recording]
5 Nov: Dani Yogatama (DeepMind) — Semiparametric Language Models [Dani Yogatama Lecture Video Recording]
3 Dec: Angeliki Lazaridou (DeepMind) — Towards multi-agent emergent communication as a building block of human-centric AI [Angeliki Lazaridou Video Lecture Recording]

Spring 2020

6 Feb: Ellie Pavlick (Brown) — What do (and should) language models know about language? [Ellie Pavlick Lecture Video Recording]

13 Feb: David Lazer (Northeastern) — Fake news on Twitter during the 2016 U.S. presidential election [David Lazer Lecture Recording]

Fall 2019

5 Sept Eunsol Choi (Google / UT Austin) — Learning to Understand Entities In Text
12 Sept Tom Kwiatkowski ( Google) — New Challenges in Question Answering: Natural Questions and Going Beyond Word Matching
19 Sept Zack Lipton (CMU) — Deep (Inter-)Active Learning for NLP: Cure-all or Catastrophe?
26 Sept Diyi Yang (Georgia Tech) — Building Language Technologies for Better Online Communities
3 Oct Robin Jia (Stanford) — Building Adversarially Robust Natural Language Processing Systems
10 Oct Ceren Budak (U Mich) — News Producers, Politically Engaged Citizens, and Social Movement Organizations Online
17 Oct Edward Grefenstette (Facebook AI) — Teaching Artificial Agents to Understand Language by Modelling Reward
24 Oct Alexis Conneau (Facebook AI) — Learning cross-lingual text representations
31 Oct Sebastian Ruder (DeepMind) — Unsupervised cross-lingual representation learning
7 Nov **NO MEETING**
14 Nov Mohit Iyyer (U Mass) — Rethinking Transformers for machine translation and story generation
21 Nov Jennifer Pan (Stanford) — Uncovering Hidden Political Activity with Data Science Tools and Social Science Approaches
5 Dec Jonathan Berant (Tel-Aviv U) — Understanding Complex Questions

Spring 2019

9 May Rashida Richardson (AI Now Institute) — Dirty Data, Bad Predictions: How Civil Rights Violation Impact Police Data, Predictive Policing Systems, and Justice
7 Feb Percy Liang (Stanford) — Can Language Robustify Learning?
14 Feb – 21 Mar ** No meeting**
28 Mar Adji Bousso Dieng (Columbia) — Deep Bayesian Learning as a Paradigm for Text Modeling
4 Apr Jacob Devlin (Google) — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
11 Apr Yulia Tsvetkov (CMU) — Towards a Computational Analysis of the Language of Veiled Manipulation
18 Apr Zachary Steinert-Threlkeld (UCLA) — The Effect of Violence, Cleavages, and Free-riding on Protest Size
25 Apr Rico Sennrich (Edinburgh) —Document-level Machine Translation: Recent Progress and The Crux of Evaluation
2 May Hannaneh Hajishirzi (UW)

Fall 2018

13 Sep Erin Hengel (U Liverpool) — “Publishing while female: Are women held to higher standards? Evidence from peer review”
20 Sep *No meeting: New Directions in Analyzing Text as Data conference at UW*
21 Sep (Friday) Dan Roth (UPenn) (special NYC NLP talk at Cornell Tech campus)
27 Sep *No meeting*
4 Oct Emily Gade (UW/Moore-Sloan) — What Counts as Terrorism? An Examination of Terrorist Designations among U.S. Mass Shootings
11 Oct Marine Carpuat (U Maryland) — Semantic and Style Divergences in Machine Translation
18 Oct Bryce Dietrich (U Iowa/Harvard) — Do Representatives Emphasize Some Groups More Than Others?
25 Oct Taylor Berg-Kirkpatrick (UCSD) — Unsupervised Models for Unlocking Language Data
1 Nov Laila Wahedi (Georgetown) — Constructing Networks From Social Media Text: How to Do It and When You Should, From Trolls to Journalists
8 Nov Jacob Andreas (Microsoft/MIT) — Learning by Narrating
15 Nov Sasha Rush (Harvard) — Controllable Text Generation with Deep Latent-Variable Models
22 Nov *No meeting: Thanksgiving*
29 Nov Kyle Gorman (CUNY) — Grammar engineering in text-to-speech synthesis
6 Dec Walter R. Mebane, Jr. (U Michigan) — What You Say You See is Who You Are: Observing Election Incidents in the United States via Twitter
13 Dec Philip Resnik (U Maryland) — Mental Health as an Application Area for Computational Linguistics: Prospects and Challenges

Spring 2018

25-Jan Omer Levy (UW) — Towards Understanding Deep Learning for Natural Language Processing
1-Feb Kevin Knight (USC) — What are Neural Sequence Models Doing?
8-Feb Bruno Gonçalves (NYU CDS / Aix-Marseille Université) — Spatio temporal analysis of Language use
15-Feb * No meeting *
22-Feb * No meeting *
1-Mar Maja Rudolph (Columbia) — Structured Embedding Models for Language Variation
8-Mar Justine Zhang (Cornell) — Unsupervised Models of Conversational Dynamics
15-Mar *No meeting: spring break*
22-Mar Elliott Ash (U of Warwick / ETH Zurich) — Proportional Representation Increases Party Politics: Evidence from New Zealand Parliament using a Supervised Topic Model
29-Mar Luke Zettlemoyer (UW) — End-to-end Learning for Broad Coverage Semantics
5-Apr Marie-Catherine de Marneffe (Ohio State) — Computational pragmatics: a case study of “speaker commitment”
12-Apr Graham Neubig (CMU) — What Can Neural Networks Teach us about Language?
19-Apr Ray Mooney (UT Austin) — Ensembles and Explanation for Visual Question Answering
26-Apr Sarah Bouchat (Northwestern) — Making a Long Story Short: Eliciting Prior Information from Previously Published Research
3-May Ben Lauderdale (LSE) — Unsupervised Methods for Extracting Political Positions from Text

Fall 2017

21-Sep Dean Knox (Microsoft Research/Princeton) and Christopher Lucas (Harvard) — Measuring Speaker Affect in Audio Data: Dynamics of Supreme Court Oral Arguments
28-Sep Rich Nielsen (MIT) — Text Analysis of Internet Islam
5-Oct Jordan Boyd-Graber (UMD) — Cooperative and Competitive Machine Learning through Question Answering
12-Oct **No meeting: Text as Data 2017 Conference at Princeton**
19-Oct Claire Cardie (Cornell) — Structured Prediction for Opinions and Arguments
26-Oct David Weiss (Google) — Parsimonious Representation Learning for NLP
2-Nov Emily Bender (UW) — Articulating How Our Data and Systems Do and Don’t Represent the World
9-Nov Hal Daumé III (UMD) — Learning Language Through Interaction
16-Nov Gerard De Melo (Rutgers) — Learning Semantics and Commonsense Knowledge from Heterogeneous Data
23-Nov **No meeting: Thanksgiving**
30-Nov Jenn Wortman Vaughan (Microsoft Research) — The Human Components of Machine Learning
7-Dec Damon Centola (UPenn) — The Emergence of Linguistic Norms: An Experimental Study of Cultural Evolution
14-Dec Fernando Diaz (Spotify) — Local Natural Language Processing in Information Retrieval

Summer 2017

26-Jul Yejin Choi (UW) — From Naive Physics to Connotation: Learning about the World from Language

Spring 2017

27-Apr Kosuke Imai (Princeton) Brendan T. O’Connor (UMass Amherst)
2-Feb Chris Callison-Burch (Penn) — The promise of crowdsourcing for natural language processing, public health, and other data sciences
9-Feb Vinod Prabhakaran (Stanford) — NLP and Society: Understanding Social Context from Language Use
16-Feb Matthew Denny (Penn State) and Arthur Spirling (NYU) —Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It
23-Feb **No Meeting**
2-Mar **No Meeting**
9-Mar Jason Eisner (JHU)
16-Mar **No Meeting: Spring Break**
23-Mar Amber Boydstun (UC Davis)
30-Mar Hong Yu (UMass Medical)
6-Apr **No Meeting**
13-Apr **Meeting Cancelled**
20-Apr Yoav Artzi (Cornell Tech)

Fall 2016

22-Sep David Bamman (Berkeley) — Beyond Bags of Words: Linguistic Structure in the Analysis of Text as Data
29-Sep Regina Barzilay (MIT) — How Can NLP Help Cure Cancer?
6-Oct Justin Grimmer (Stanford) — Exploratory and Confirmatory Causal Inference for High Dimensional Interventions
13-Oct **No Meeting: Text-as-Data Conference**
20-Oct Erin Baggott Carter (USC) — Propaganda and Protest: Evidence from Post-Cold War Africa (coauthored with Brett Carter)
27-Oct Matt Taddy (Chicago) — Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech
3-Nov Ken Benoit (LSE) — Measuring and Explaining Political Sophistication Through Textual Complexity
10-Nov Edouard Grave (Facebook) — Large scale learning for natural language processing
17-Nov Lillian Lee (Cornell) — Can language change minds?
24-Nov **No Meeting: Thanksgiving**
1-Dec Gary King (Harvard) — An Improved Method of Automated Nonparametric Content Analysis for Social Science
8-Dec Sam Bowman (NYU) — Learning neural networks for sentence understanding with the Stanford NLI corpus

Spring 2016

5-May Mark Dredze (JHU/Bloomberg) — Topic Models for Identifying Public Health Trends
4-Feb Marc Ratkovic (Princeton) — Estimating Common and Idiosyncratic Factors from Multiple Datasets
11-Feb David Mimno (Cornell) — Topic models without the randomness: new perspectives on deterministic algorithms
18-Feb Pablo Barberá (NYU) — Text vs Networks: Inferring Sociodemographic Traits of Social Media Users
25-Feb Jacob Eisenstein (GA Tech) — Sociolinguistic Structure Induction
3-Mar Slav Petrov (Google NY) — Towards Universal Syntactic Processing of Natural Language
10-Mar Laura Nelson (Northwestern, Kellogg) — Measuring Collective Cognitive Structures via Collectively Produced Text
17-Mar *Spring Break*
24-Mar Cristian Danescu-Niculescu-Mizi (Cornell) — Language and Social Dynamics
31-Mar Jason Weston (Facebook) — Evaluating Prerequisite Qualities for End-to-End Dialog Systems
7-Apr *MPSA Conference*
14-Apr Sven-Oliver Proksch (McGill) — Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Parliamentary Speeches
21-Apr Noémie Elhadad (Columbia) — Summarizing the Patient Record
28-Apr Molly Roberts (UCSD) — Matching Methods for High-Dimensional Data with Applications to Text

Fall 2015

3-Dec ******TALK CANCELLED******

10-Dec Bruno Goncalves (NYU)

10-Sep Brandon Stewart (Princeton) — Text Analysis with Document Context: the Structural Topic Model

17-Sep Yacine Jernite (NYU) — Semi-supervised methods of text processing, and an application to medical concept extraction.

24-Sep Andrew Peterson (NYU) — Legislative Text and Regulatory Authority

1-Oct John Henderson (Yale) — Crowdsourcing Experiments to Estimate an Ideological Dimension in Text

8-Oct Ken Benoit (LSE) — Mining Multiword Expressions to Improve Bag of Words Models in Political Science Text Analysis

15-Oct Noah Smith (U of Washington) — Learning Political Embeddings from Text

19-Oct — Intro to Text Analysis Using R, a one-day workshop led by Ken Benoit (LSE)

22-Oct David Blei (Columbia)— Probabilistic Topic Models and User Behavior

29-Oct Zubin Jelveh (NYU) Suresh Naidu (Columbia) –Political Language in Economics

5-Nov Hanna Wallach (Microsoft Research) — The Bayesian Echo Chamber: Modeling Influence in Conversations

12-Nov Jacob Montgomery (WashU) — Funneling the Wisdom of Crowds: The SentimentIt Platform for Human Computation Text Analysis

19-Nov Michael Colaresi (Michigan State) — Learning Human Rights, Lefts, Ups and Downs: Using Lexical and Syntactic Features to Understand Evolving Human Rights Standards