On this page: Series Overview • Fall 2024 Events • Mailing List • Spring 2024 Seminar Info & Recordings • Archive of Past Events
The NYU NLP and Text-as-Data Speaker Series takes place on Thursdays, 4:00pm to 5:30pm at 60 5th Avenue (7th floor common area). This interdisciplinary series brings together experts from a wide range of fields, including computer science, linguistics, and social sciences, to explore cutting-edge research in Natural Language Processing (NLP) and text-as-data analysis.
Series Overview
Originally focused on text data applied to social science applications, the series has expanded to incorporate the growing interest in NLP from various disciplines. It provides an opportunity for attendees to engage with groundbreaking work that pushes the boundaries of what’s possible in language processing and analysis. The seminar is organized by Sam Bowman, He He, João Sedoc, Eunsol Choi, and Tal Linzen, and is open to anyone who is affiliated with NYU and wishes to attend.
See below for our schedule of speakers and sign up for our mailing list for details about live-streaming of talks and updates to the schedule.
All talks are currently scheduled to be in-person. The schedule listed below is tentative. Live talk attendance is limited, but recordings will be posted on this page and may be accessed by anyone.
Fall 2024 Events
Upcoming
There are currently no upcoming seminars.
Past
Michael Hahn (Saarland University) Lecture Recording (captions coming soon)
Date and Time: December 5, 4:00pm
Title: Understanding Language Models via Theory and Interpretability
Speaker: Michael Hahn
Abstract: Recent progress in LLMs has rapidly outpaced our ability to understand their inner workings. This talk describes our recent work addressing this challenge. First, we develop rigorous mathematical theory describing the abilities (and limitations) of transformers in performing computations foundational to reasoning. We also examine differences and similarities with state-space models such as Mamba. Second, we propose a theoretical framework for understanding success and failure of length generalization in transformers. Third, we propose a method for reading out information from activations inside neural networks, and apply it to mechanistically interpret transformers performing various tasks. I will close with directions for future research.
Bio: Michael holds the Chair for Language, Computation, and Cognition at Saarland University. He received his PhD from Stanford University in 2022, advised by Judith Degen and Dan Jurafsky. He is interested in language, machine learning, and computational cognitive science.
Lecture Slides: Michael Hahn Lecture Slides
Jennifer Hu (Johns Hopkins / Harvard)
Date and Time: November 21, 4:00pm
Title: How to Know What Language Models Know
Speaker: Jennifer Hu
Abstract: As language models (LMs) become more sophisticated, there is growing interest in their cognitive abilities such as reasoning or grammatical generalization. However, we only have access to evaluations, which can only indirectly measure these latent constructs. In this talk, I take inspiration from the concept of “task demands” in cognitive science to design and understand LM evaluations. I will first describe how task demands can affect LMs’ behaviors. I will then present case studies showing how evaluations with different task demands can lead to vastly different conclusions about LMs’ abilities. Specifically, prompt-based evaluations (e.g., “Is the following sentence grammatical? [sentence]“) yield systematically lower performance than string probability comparisons, and smaller LMs are more sensitive to task demands than LMs with more parameters or training, mirroring findings in developmental psychology. These results underscore the importance of specifying the assumptions behind our evaluation design choices before we draw conclusions about LMs’ capabilities.
Bio: Jennifer Hu is an incoming Assistant Professor of Cognitive Science at Johns Hopkins University, where she will direct the Group for Language and Intelligence. Currently, she is a Research Fellow at the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. Her research aims to understand the computational and cognitive principles that underlie human language.
Swabha Swayamdipta (USC) Seminar Recording
Date and Time: November 7, 4:00pm
Title: Ensuring Safety and Accountability in LLMs, Pre- and Post Training (Slides)
Speaker: Swabha Swayamdipta
Abstract: As large language models have become ubiquitous, it has proven increasingly challenging to enforce their accountability and safe deployment. In this talk, I will discuss the importance of ensuring the safety, responsibility, and accountability of Large Language Models (LLMs) throughout all stages of their development: pre-training, post-training evaluation, and deployment. First, I will present the idea of a unique LLM signature that can identify the model to ensure accountability. Next, I will present our recent work on reliably evaluating LLMs through our novel formulation of generation separability, and how this could lead to more reliable generation. Finally, I will present some ongoing work that demonstrates LLMs’ ability to understand but not generate unsafe or untrustworthy content.
Bio: Swabha Swayamdipta is an Assistant Professor of Computer Science and a Gabilan Assistant Professor at the University of Southern California. Her research interests lie in natural language processing and machine learning, with a primary focus on the evaluation of generative models of language, understanding the behavior of language models, and designing language technologies for societal good. At USC, Swabha leads the Data, Interpretability, Language, and Learning (DILL) Lab. She received her PhD from Carnegie Mellon University, followed by a postdoctoral position at the Allen Institute for AI. Her work has received awards at ICML, NeurIPS, and ACL. Her research is supported by awards from the National Science Foundation, the Allen Institute for AI, and a Rising Star Award from Intel Labs.
Noah Goodman (Stanford) Seminar Recording
Date and Time: October 17, 2024, 4:00pm
Title: Learning To Reason In Language Models
Speaker: Noah Goodman
Bio: Noah D. Goodman is Associate Professor of Psychology, Linguistics (by courtesy), and Computer Science (by courtesy) at Stanford University. He studies the computational basis of human thought, merging behavioral experiments with formal methods from statistics and logic. Specific projects vary from concept learning and language understanding to inference algorithms for probabilistic programming languages. He received his Ph.D. in mathematics from the University of Texas at Austin in 2003. In 2005 he entered cognitive science, working as Postdoc and Research Scientist at MIT. In 2010 he moved to Stanford where he runs the Computation and Cognition Lab.
Tim O’Donnell (McGill University)
Date and Time: October 1, 2024, 4:00pm
Title: Linguistic Compositionality and Incremental Processing
Speaker: Tim O’Donnell
Abstract: In this talk, I will present recent projects focusing on two key properties of natural language. First, I will discuss the problem of incremental processing, presenting modeling and empirical work on the nature of the algorithms that underlie the human sentence processor and discuss information theoretic tools that quantify processing difficulty. Second, I will discuss recent work on developing related information theoretic tools for defining and measuring the degree of compositionality in a system.
Bio: Tim O’Donnell is an associate professor and William Dawson Scholar in the Department of Linguistics at McGill University and a CIFAR Canada AI Chair at Mila, the Quebec AI institute. His research focuses on developing mathematical and computational models of how people learn to represent, process, and generalize language and music. His work draws on techniques from computational linguistics, machine learning, and artificial intelligence, integrating concepts from theoretical linguistics and methods from experimental psychology and looking at problems from all these domains.
Our Mailing List
Never miss a seminar by joining our mailing list!
Spring 2024 Seminar Info & Recordings
Paradoxes in Transformer Language Models: Positional Encodings
Siva Reddy, McGill University / MILA
Speaker: Siva Reddy
Date: February 2, 2024
Abstract: The defining features of Transformer Language Models, such as causal masking, positional encodings, and their monolithic architecture (i.e., the absence of a specific routing mechanism), are paradoxically the same features that hinder their generalization capabilities, and removing them makes them better at generalization. I will present evidence of these paradoxes on various generalizations, including length generalization, instruction following, and multi-task learning.
Bio: Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University. He is also a Facebook CIFAR AI Chair, a core faculty member of Mila Quebec AI Institute and a research scientist at ServiceNow Research. His research focuses on representation learning for language that facilitates reasoning, conversational modeling and safety. He received the 2020 VentureBeat AI Innovation Award in NLP, and the best paper award at EMNLP 2021. Before McGill, he was a postdoctoral researcher at Stanford University and a Google PhD fellow at the University of Edinburgh.
How large language models can contribute to cognitive science
Roger Levy, Massachusetts Institute of Technology
Speaker: Roger Levy
Date: February 22, 2024
Abstract: Large language models (LLMs) are the first human-created artifacts whose text processing and generation capabilities seem to approach our own. But the hardware they run on is vastly different than ours, and the software implementing them probably is too. How, then, can we use LLMs to advance the science of language in the human mind? In this talk I present a set of case studies that exemplify three answers to this question: LLMs can help us place lower bounds on the learnability of linguistic generalizations; they can help us reverse-engineer human language processing mechanisms; and they can help us develop hypotheses for the interface between language and other cognitive mechanisms. The case studies include controlled tests of grammatical generalizations in LLMs; computational models of how adults understand what young children say; psychometric benchmarking of multimodal LLMs; and neurosymbolic models of reasoning in logical problems posed in natural language.
This talk covers joint work with Elika Bergelson, Ruth Foushee, Alex Gu, Jennifer Hu, Anna Ivanova, Benjamin Lipkin, Gary Lupyan, Kyle Mahowald, Stephan Meylan, Theo Olausson, Subha Nawer Pushpita, Armando Solar-Lezama, Joshua Tenenbaum, Ethan Wilcox, Nicole Wong, and Cedegao Zhang.
Bio: Roger Levy joined the Department of Brain and Cognitive Sciences in 2016. Levy received his BS in mathematics from the University of Arizona in 1996, followed by a year as a Fulbright Fellow at the Inter-University Program for Chinese Language Study, Taipei, Taiwan and a year as a research student in biological anthropology at the University of Tokyo. In 2005, he completed his doctoral work at Stanford University under the direction of Christopher Manning, and then spent a year as a UK Economic and Social Research Council Postdoctoral Fellowship at the University of Edinburgh. Before his appointment at MIT he was faculty in the Department of Linguistics at the University of California, San Diego. Levy’s awards include the Alfred P. Sloan Research Fellowship, the NSF Faculty Early Career Development (CAREER) Award, and a Fellowship at the Center for Advanced Study in the Behavioral Sciences.
Practical AI Systems: From General-Purpose AI to (the Right) Specific Use Cases
Sherry Wu, Carnegie Mellon University
Speaker: Sherry Wu
Date: March 7, 2024
Abstract: AI research has made great strides in developing general-purpose models (e.g., LLMs) that can excel across a wide range of tasks, enabling users to explore AI applications tailored to their unique needs without the complexities of custom model training. However, with the opportunities come the challenges — General-purpose models prioritize overall performance, but this can neglect specific user needs. How can we make these models practically usable? In this talk, I will present our recent work on assessing and tailoring general-purpose models for specific use cases. I will first cover methods for evaluating and mapping LLMs to specific usage scenarios, then reflect on the importance of identifying the right tasks for LLMs by comparing how humans and LLMs may perform the same tasks differently. In my final remarks, I will discuss the potential of training humans and LLMs with complementary skill sets.
Bio: Sherry Tongshuang Wu is an Assistant Professor in the Human-Computer Interaction Institute at Carnegie Mellon University. Her research lies at the intersection of Human-Computer Interaction and Natural Language Processing, and primarily focuses on how humans (AI experts, lay users, domain experts) can practically interact with (debug, audit, and collaborate with) AI systems. To this end, she has worked on assessing NLP model capabilities, supporting human-in-the-loop NLP model debugging and correction, as well as facilitating human-AI collaboration. She has authored award-winning papers in top-tier NLP, HCI and Visualization conferences and journals such as ACL, CHI, TOCHI, TVCG, etc. Before joining CMU, Sherry received her Ph.D. degree from the University of Washington and her bachelor degree from the Hong Kong University of Science and Technology, and has interned at Microsoft Research, Google Research, and Apple.
LLMs under Microscope: Illuminating the Bling Spots and Improving the Reliability of Language Models
Yulia Tsvetkov, Paul G. Allen School of Computer Science & Engineering
Speaker: Yulia Tsvetkov
Date: April 4, 2024
Abstract: Large language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. In this talk. I’ll present our recent work proposing new methods to (1) measure media biases in LMs trained on such corpora, along the social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. In this study, we find that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into social-oriented tasks such as hate speech and misinformation detection. In the second part of my talk, I’ll discuss ideas on mitigating LMs’ unfairness. Rather than debiasing models—which, our work shows, is impossible—we propose to understand, calibrate, and better control for their social impacts using modular methods in which diverse LMs collaborate at inference time.
Bio: Yulia Tsvetkov is an associate professor at the Paul G. Allen School of Computer Science & Engineering at University of Washington. Her research group works on fundamental advancements to large language models, multilingual NLP, and AI ethics. This research is motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby making NLP tools available to all users. Prior to joining UW, Yulia was an assistant professor at Carnegie Mellon University and a postdoc at Stanford. Yulia is a recipient of NSF CAREER, Sloan Fellowship, Okawa Research award, and several paper awards and runner-ups at NLP, ML, and CSS conferences.
Archive of Past Events
Fall 2023
Date: September 7
Speaker: Lucie Flek (University of Bonn/University of Marburg)
Title: Human-centered NLP in the LLM Era: Robustness and efficiency through socially and personally contextualised understanding
Slides: Lucie Flek Slides
Date: October 19
Speaker: Jacob Steinhardt (UC Berkeley)
Title: Using Large Language Models to Understand Large Language Models
Recording: Jacob Steinhardt (UC Berkeley) Lecture Recording
Date: November 9
Speaker: Yoon Kim (MIT)
Slides: Yoon Kim TaD Lecture Slides
Title: Large Language Models & Symbolic Structures
Date: November 16
Speaker: Eunsol Choi (UT Austin)
Title: Knowledge Augmentation for Language Models
Recording: Eunsol Choi Lecture Recording
Slides: Eunsol Choi Lecture Slides
Date: November 30
Speaker: Daniel Khashabi (Johns Hopkins University)
Title: Large Language Models: Revisiting Few Mysteries
Recording: Daniel Khashabi’s Lecture Recording
Slides: Daniel Khashabi’s Lecture Slides
Spring 2023
Date: March 30
Speaker: Sasha (Alexander) Rush (Cornell)
Title: “Pretraining without Attention”
Date: April 6
Speaker: Ellie Pavlick (Brown University)
Title: “Mechanisms for Compositionality in Neural Networks”
Recording: Ellie Pavlick Lecture Recording
Date: April 13
Speaker: Mohit Iyyer (UMass)
Title: On using large language models to translate literary works, and detecting when they’ve been used
Recording: Mohit Iyyer Lecture Recording
Date: April 20
Speaker: Asli Celikyilmaz (Meta)
Title: Exploring Machine Thinking
Slides: Asli Celikyilmaz Lectures Slides
Date: April 27
Speaker: Jacob Andreas (MIT)
Title: Language Models as World Models
Recording: Jacob Andreas Lecture Recording
Slides: Jacob Andreas Lecture Slides
Fall 2022
Date: 8 September
Speaker: Dan Jurafsky (Stanford)
Recording: Dan Jurafsky Video Recording
Slides: Dan Jurafsky Lecture Slides
Date: 22 September
Speaker: Emily Pitler (Google)
Recording: Emily Pitler Video Recording
Slides: Emily Pitler Lecture Slides
Date: 13 October – Alan Ritter (Georgia Tech)
Recording: Alan Ritter Lecture Recording
Slides: Alan Ritter Lecture Slides
Date: 20 October
Speaker: Chris Manning (Stanford)
Recording: Chris Manning Lecture Recording
Slides: Chris Manning Lecture Slides
Date: 17 November
Speaker: Emma Strubell (CMU)
Recording: Emma Strubell Lecture Recording
Slides: Emma Strubell Lecture Slides
Spring 2022
Date: 10 March
Speaker: Chenhao Tan (UChicago)
Title: Towards Human-Centered Explanations of AI Predictions
Recording: Chenhao Tan Recording
Slides: Chenhao Tan Slides
Date: 24 March
Speaker: Sameer Singh (UC Irvine)
Title: Lipstick on a Pig: Using Language Models as Few-Shot Learners
Recording: Sameeer Singh Recording
Slides: Sameer Singh Slides
Date: 31 March
Speaker: Sebastian Schuster (NYU)
Title: How contextual are contextual language models?
Recording: Sebastian Schuster Recording
Slides: Sebastian Schuster Slides
Date: 14 April
Speaker: Greg Durrett (UT Austin)
Title: Why natural language is the right vehicle for complex reasoning?
Recording: Greg Durrett Recording
Slides: Greg Durrett Slides
Date: 21 April
Speaker: Douwe Kiela (HuggingFace)
Title: Progress in Dynamic Adversarial Data Collection & Adventures in Multimodal Machine Learning
Recording: Recording
Slides: Douwe Kiela Slides
Fall 2021
- 23 Sept: Ankur Parikh (Google) – Towards High Precision Text Generation
- 21 Oct: Alex Warstadt (NYU) – Testing the Learnability of Grammar for Humans and Machines: Investigations with Artificial Neural Networks. [Alex Warstadt Slides]
- 4 Nov: Marianna Apidianaki (UPenn) – Lexical Polysemy and Intensity in Contextualized Representations [Marianna Apidianaki Slides]
- 18 Nov: Danqi Chen (Princeton) – Contrastive Representation Learning in Text – [Danqi Chen Slides]
Spring 2021
- 4 Feb: Byron Wallace (Northeastern) — What does the evidence say? Language technologies to help make sense of biomedical texts [Byron Wallace Lecture Video Recording]
- 4 March: Nanyun Peng (UCLA) — Controllable Text Generation Beyond Auto-regressive Models [Nanyun Peng Lecture Video Recording]
- 18 March: Karl Stratos (Rutgers) — Maximal Mutual Information Predictive Coding for Natural Language Processing [Karl Stratos Video Lecture Recording]
- 1 April: Su Lin Blodgett (Microsoft) — Language and Justice: Reconsidering Harms in NLP Systems and Practices [NYU community only: Sun Lin Blodgett Lecture Video Recording]
- 15 April: Allyson Ettinger (University of Chicago) — “Understanding” and prediction: Controlled examinations of meaning sensitivity in pre-trained models [Allyson Ettinger Lecture Video Recording]
- 29 April: Wei Xu (Georgia Tech) — Importance of Data and Linguistics in Neural Language Generation [Wei Xu Lecture Video Recording]
Fall 2020
- 17 Sept: Anna Rogers (Copenhagen) — When BERT plays the lottery, all tickets are winning
- 24 Sept: Matt Gardner (AI2) — Contrastive pairs are better than independent samples, for both learning and evaluation [Matt Gardner Lecture Video Recording]
- 8 Oct: Yonatan Belinkov (Technion) — Causal Mediation Analysis for Interpreting NLP Models: The Case of Gender Bias [Yonatan Belinkov Lecture Video Recording]
- 22 Oct: Tatsunori Hashimoto (Stanford) — Robustness based approaches for improving natural language generation and understanding [Tatsunori Hashimoto Video Lecture Recording]
- 29 Oct: Dan Weld (University of Washington) — Semantic Scholar – Advanced NLP to Accelerate Scientific Research [Dan Weld Lecture Video Recording]
- 5 Nov: Dani Yogatama (DeepMind) — Semiparametric Language Models [Dani Yogatama Lecture Video Recording]
- 3 Dec: Angeliki Lazaridou (DeepMind) — Towards multi-agent emergent communication as a building block of human-centric AI [Angeliki Lazaridou Video Lecture Recording]
Spring 2020
- 13 Feb: David Lazer (Northeastern) — Fake news on Twitter during the 2016 U.S. presidential election [David Lazer Lecture Recording]
- 6 Feb: Ellie Pavlick (Brown) — What do (and should) language models know about language? [Ellie Pavlick Lecture Video Recording]
Fall 2019
- 5 Sept Eunsol Choi (Google / UT Austin) — Learning to Understand Entities In Text
- 12 Sept Tom Kwiatkowski ( Google) — New Challenges in Question Answering: Natural Questions and Going Beyond Word Matching
- 19 Sept Zack Lipton (CMU) — Deep (Inter-)Active Learning for NLP: Cure-all or Catastrophe?
- 26 Sept Diyi Yang (Georgia Tech) — Building Language Technologies for Better Online Communities
- 3 Oct Robin Jia (Stanford) — Building Adversarially Robust Natural Language Processing Systems
- 10 Oct Ceren Budak (U Mich) — News Producers, Politically Engaged Citizens, and Social Movement Organizations Online
- 17 Oct Edward Grefenstette (Facebook AI) — Teaching Artificial Agents to Understand Language by Modelling Reward
- 24 Oct Alexis Conneau (Facebook AI) — Learning cross-lingual text representations
- 31 Oct Sebastian Ruder (DeepMind) — Unsupervised cross-lingual representation learning
- 7 Nov **NO MEETING**
- 14 Nov Mohit Iyyer (U Mass) — Rethinking Transformers for machine translation and story generation
- 21 Nov Jennifer Pan (Stanford) — Uncovering Hidden Political Activity with Data Science Tools and Social Science Approaches
- 5 Dec Jonathan Berant (Tel-Aviv U) — Understanding Complex Questions
Spring 2019
- 9 May Rashida Richardson (AI Now Institute) — Dirty Data, Bad Predictions: How Civil Rights Violation Impact Police Data, Predictive Policing Systems, and Justice
- 7 Feb Percy Liang (Stanford) — Can Language Robustify Learning?
- 14 Feb – 21 Mar ** No meeting**
- 28 Mar Adji Bousso Dieng (Columbia) — Deep Bayesian Learning as a Paradigm for Text Modeling
- 4 Apr Jacob Devlin (Google) — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- 11 Apr Yulia Tsvetkov (CMU) — Towards a Computational Analysis of the Language of Veiled Manipulation
- 18 Apr Zachary Steinert-Threlkeld (UCLA) — The Effect of Violence, Cleavages, and Free-riding on Protest Size
- 25 Apr Rico Sennrich (Edinburgh) —Document-level Machine Translation: Recent Progress and The Crux of Evaluation
- 2 May Hannaneh Hajishirzi (UW)
Fall 2018
- 13 Sep Erin Hengel (U Liverpool) — “Publishing while female: Are women held to higher standards? Evidence from peer review”
- 20 Sep *No meeting: New Directions in Analyzing Text as Data conference at UW*
- 21 Sep (Friday) Dan Roth (UPenn) (special NYC NLP talk at Cornell Tech campus)
- 27 Sep *No meeting*
- 4 Oct Emily Gade (UW/Moore-Sloan) — What Counts as Terrorism? An Examination of Terrorist Designations among U.S. Mass Shootings
- 11 Oct Marine Carpuat (U Maryland) — Semantic and Style Divergences in Machine Translation
- 18 Oct Bryce Dietrich (U Iowa/Harvard) — Do Representatives Emphasize Some Groups More Than Others?
- 25 Oct Taylor Berg-Kirkpatrick (UCSD) — Unsupervised Models for Unlocking Language Data
- 1 Nov Laila Wahedi (Georgetown) — Constructing Networks From Social Media Text: How to Do It and When You Should, From Trolls to Journalists
- 8 Nov Jacob Andreas (Microsoft/MIT) — Learning by Narrating
- 15 Nov Sasha Rush (Harvard) — Controllable Text Generation with Deep Latent-Variable Models
- 22 Nov *No meeting: Thanksgiving*
- 29 Nov Kyle Gorman (CUNY) — Grammar engineering in text-to-speech synthesis
- 6 Dec Walter R. Mebane, Jr. (U Michigan) — What You Say You See is Who You Are: Observing Election Incidents in the United States via Twitter
- 13 Dec Philip Resnik (U Maryland) — Mental Health as an Application Area for Computational Linguistics: Prospects and Challenges
Spring 2018
- 25-Jan Omer Levy (UW) — Towards Understanding Deep Learning for Natural Language Processing
- 1-Feb Kevin Knight (USC) — What are Neural Sequence Models Doing?
- 8-Feb Bruno Gonçalves (NYU CDS / Aix-Marseille Université) — Spatio temporal analysis of Language use
- 15-Feb * No meeting *
- 22-Feb * No meeting *
- 1-Mar Maja Rudolph (Columbia) — Structured Embedding Models for Language Variation
- 8-Mar Justine Zhang (Cornell) — Unsupervised Models of Conversational Dynamics
- 15-Mar *No meeting: spring break*
- 22-Mar Elliott Ash (U of Warwick / ETH Zurich) — Proportional Representation Increases Party Politics: Evidence from New Zealand Parliament using a Supervised Topic Model
- 29-Mar Luke Zettlemoyer (UW) — End-to-end Learning for Broad Coverage Semantics
- 5-Apr Marie-Catherine de Marneffe (Ohio State) — Computational pragmatics: a case study of “speaker commitment”
- 12-Apr Graham Neubig (CMU) — What Can Neural Networks Teach us about Language?
- 19-Apr Ray Mooney (UT Austin) — Ensembles and Explanation for Visual Question Answering
- 26-Apr Sarah Bouchat (Northwestern) — Making a Long Story Short: Eliciting Prior Information from Previously Published Research
- 3-May Ben Lauderdale (LSE) — Unsupervised Methods for Extracting Political Positions from Text
Fall 2017
- 21-Sep Dean Knox (Microsoft Research/Princeton) and Christopher Lucas (Harvard) — Measuring Speaker Affect in Audio Data: Dynamics of Supreme Court Oral Arguments
- 28-Sep Rich Nielsen (MIT) — Text Analysis of Internet Islam
- 5-Oct Jordan Boyd-Graber (UMD) — Cooperative and Competitive Machine Learning through Question Answering
- 12-Oct **No meeting: Text as Data 2017 Conference at Princeton**
- 19-Oct Claire Cardie (Cornell) — Structured Prediction for Opinions and Arguments
- 26-Oct David Weiss (Google) — Parsimonious Representation Learning for NLP
- 2-Nov Emily Bender (UW) — Articulating How Our Data and Systems Do and Don’t Represent the World
- 9-Nov Hal Daumé III (UMD) — Learning Language Through Interaction
- 16-Nov Gerard De Melo (Rutgers) — Learning Semantics and Commonsense Knowledge from Heterogeneous Data
- 23-Nov **No meeting: Thanksgiving**
- 30-Nov Jenn Wortman Vaughan (Microsoft Research) — The Human Components of Machine Learning
- 7-Dec Damon Centola (UPenn) — The Emergence of Linguistic Norms: An Experimental Study of Cultural Evolution
- 14-Dec Fernando Diaz (Spotify) — Local Natural Language Processing in Information Retrieval
Summer 2017
- 26-Jul Yejin Choi (UW) — From Naive Physics to Connotation: Learning about the World from Language
Spring 2017
- 27-Apr
Kosuke Imai (Princeton)Brendan T. O’Connor (UMass Amherst)
- 2-Feb Chris Callison-Burch (Penn) — The promise of crowdsourcing for natural language processing, public health, and other data sciences
- 9-Feb Vinod Prabhakaran (Stanford) — NLP and Society: Understanding Social Context from Language Use
- 16-Feb Matthew Denny (Penn State) and Arthur Spirling (NYU) —Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It
- 23-Feb **No Meeting**
- 2-Mar **No Meeting**
- 9-Mar Jason Eisner (JHU)
- 16-Mar **No Meeting: Spring Break**
- 23-Mar Amber Boydstun (UC Davis)
- 30-Mar Hong Yu (UMass Medical)
- 6-Apr **No Meeting**
- 13-Apr **Meeting Cancelled**
- 20-Apr Yoav Artzi (Cornell Tech)
Fall 2016
- 22-Sep David Bamman (Berkeley) — Beyond Bags of Words: Linguistic Structure in the Analysis of Text as Data
- 29-Sep Regina Barzilay (MIT) — How Can NLP Help Cure Cancer?
- 6-Oct Justin Grimmer (Stanford) — Exploratory and Confirmatory Causal Inference for High Dimensional Interventions
- 13-Oct **No Meeting: Text-as-Data Conference**
- 20-Oct Erin Baggott Carter (USC) — Propaganda and Protest: Evidence from Post-Cold War Africa (coauthored with Brett Carter)
- 27-Oct Matt Taddy (Chicago) — Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech
- 3-Nov Ken Benoit (LSE) — Measuring and Explaining Political Sophistication Through Textual Complexity
- 10-Nov Edouard Grave (Facebook) — Large scale learning for natural language processing
- 17-Nov Lillian Lee (Cornell) — Can language change minds?
- 24-Nov **No Meeting: Thanksgiving**
- 1-Dec Gary King (Harvard) — An Improved Method of Automated Nonparametric Content Analysis for Social Science
- 8-Dec Sam Bowman (NYU) — Learning neural networks for sentence understanding with the Stanford NLI corpus
Spring 2016
- 5-May Mark Dredze (JHU/Bloomberg) — Topic Models for Identifying Public Health Trends
- 4-Feb Marc Ratkovic (Princeton) — Estimating Common and Idiosyncratic Factors from Multiple Datasets
- 11-Feb David Mimno (Cornell) — Topic models without the randomness: new perspectives on deterministic algorithms
- 18-Feb Pablo Barberá (NYU) — Text vs Networks: Inferring Sociodemographic Traits of Social Media Users
- 3-Mar Slav Petrov (Google NY) — Towards Universal Syntactic Processing of Natural Language
- 10-Mar Laura Nelson (Northwestern, Kellogg) — Measuring Collective Cognitive Structures via Collectively Produced Text
- 17-Mar *Spring Break*
- 24-Mar Cristian Danescu-Niculescu-Mizi (Cornell) — Language and Social Dynamics
- 7-Apr *MPSA Conference*
- 14-Apr Sven-Oliver Proksch (McGill) — Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Parliamentary Speeches
- 21-Apr Noémie Elhadad (Columbia) — Summarizing the Patient Record
- 28-Apr Molly Roberts (UCSD) — Matching Methods for High-Dimensional Data with Applications to Text
Fall 2015
- 10-Dec Bruno Goncalves (NYU)
- 10-Sep Brandon Stewart (Princeton) — Text Analysis with Document Context: the Structural Topic Model
- 17-Sep Yacine Jernite (NYU) — Semi-supervised methods of text processing, and an application to medical concept extraction.
- 24-Sep Andrew Peterson (NYU) — Legislative Text and Regulatory Authority
- 1-Oct John Henderson (Yale) — Crowdsourcing Experiments to Estimate an Ideological Dimension in Text
- 8-Oct Ken Benoit (LSE) — Mining Multiword Expressions to Improve Bag of Words Models in Political Science Text Analysis
- 15-Oct Noah Smith (U of Washington) — Learning Political Embeddings from Text
- 19-Oct — Intro to Text Analysis Using R, a one-day workshop led by Ken Benoit (LSE)
- 22-Oct David Blei (Columbia)— Probabilistic Topic Models and User Behavior
- 29-Oct Zubin Jelveh (NYU) Suresh Naidu (Columbia) –Political Language in Economics
- 5-Nov Hanna Wallach (Microsoft Research) — The Bayesian Echo Chamber: Modeling Influence in Conversations
- 12-Nov Jacob Montgomery (WashU) — Funneling the Wisdom of Crowds: The SentimentIt Platform for Human Computation Text Analysis
- 19-Nov Michael Colaresi (Michigan State) — Learning Human Rights, Lefts, Ups and Downs: Using Lexical and Syntactic Features to Understand Evolving Human Rights Standards
- 3-Dec ******TALK CANCELLED******