Master’s in Data Science
Capstone Project
Capstone Project
CDS master’s students have a unique opportunity to solve real-world problems through the capstone course in the final year of their program. The capstone course is designed to apply knowledge into practice and to develop and improve critical skills such as problem-solving and collaboration skills.
Students are matched with research labs within the NYU community and with industry partners to investigate pressing issues, applying data science to the following areas:
- Probability and statistical analyses
- Natural language processing
- Big Data analysis and modeling
- Machine learning and computational statistics
- Coding and software engineering
- Visualization modeling
- Neural networks
- Signal processing
- High dimensional statistics
Capstone projects present students with the opportunity to work in their field of interest and gain exposure to applicable solutions. Project sponsors, NYU labs, and external partners, in turn receive the benefit of having a new perspective applied to their projects.
“Capstone is a unique opportunity for students to solve real world problems through projects carried out in collaboration with industry partners or research labs within the NYU community,” says capstone advisor and CDS Research Fellow Anastasios Noulas. “It is a vital experience for students ahead of their graduation and prior to entering the market, as it helps them improve their skills, especially in problem solving contexts that are atypical compared to standard courses offered in the curriculum. Cooperation within teams is another crucial skill built through the Capstone experience as projects are typically run across groups of 2 to 4 people.”
The Capstone Project offers the opportunity for organizations to propose a project that our graduate students will work on as part of their curriculum for one semester. Information on the course along with a questionnaire to propose a project, can be found on the Capstone Fall 2024 Project Submission Form. If you have any questions, please reach out to ds-capstone@nyu.edu.
Best Fall 2023 Capstone Posters
Best Fall 2023 Student Voted Posters
Best Fall 2023 Student Voted Runner-Up Posters
Fall 2023 Capstone Project List
- Partisan Bias and the US Federal Court System
- Segmentation of Metastatic Brain Tumors Using Deep Learning
- Discovering misinformation narratives from suspended tweets using embedding-based clustering algorithms
- Network Intrusion Detection Systems using Machine Learning
- Knowledge Extraction from Pathology Reports Using LLMs
- Building an Interactive Browser for Epigenomic & Functional Maps from the Viewpoint of Disease Association
- Prediction of Acute Pancreatitis Severity Using CT Imaging and Deep Learning
- User-centric AI models for assisting the blind
- A machine learning model to predict future kidney function in patients undergoing treatment for kidney masses
- Fine-Tuning of MedSAM for the Automated Segmentation of Musculoskeletal MRI for Bone Topology Evaluation and Radiomic Analysis
- Online News Content Neural Network Recommendation Engine
- Explanatory Modeling for Website Traffic Movements
- Egocentric video zero-shot object detection
- Leverage OncoKB’s Curated Literature Database to Build an NLP Biomarker Identifier
- Improving Out-of-Distribution Generalization in Neural Models for Astrophics and Cosmology?
- Preparing a Flood Risk Index for the State of Assam, India
- Causal GANs
- Bringing Structure to Emergent Taxonomies from Open-Ended CMS Tags
- Social Network Analysis of Hospital Communication Networks
- Multimodal Question Answering
- Does resolution matter for transfer learning with satelitte imagery?
- Multi-Modal Foundation Models for Medicine
- Representational geometry of learning rules in neural networks
- Measuring Optimizer-Agnostic Hyperparameter Tuning Difficulty
- Extracting causal political narratives from text.
- Designing Principled Training Methods for Deep Neural Networks
- Multimodal NLP for M&A Agreements
- Using Deep Learning to Solve Forward-Backward Stochastic Differential Equations
- OptiComm: Maximizing Medical Communication Success with Advanced Analytics
- Automated assessment of epilepsy subtypes using patient-generated language data
- Predicting cancer drug response of patients from their alteration and clinical data
- Identify & Summarize top key events for a given company from News Data using ML and NLP Models
- Developing predictive shooting accuracy metric(s) for First-Person-Shooter esports
- Supporting Student Success through Pipeline Curricular Analysis
- Transformers for Electronic Health Records
- Build Models for Multilingual Medical Coding
- Metadata Extraction from Spoken Interactions Between Mothers and Young Children
- Medical Data Leakage with Multi-site Collaborative Training
- Uncertainty Radius Selection in Distributionally Robust Portfolio Optimization
- Unveiling Insights into Employee Benefit Plans and Insurance Dynamics
- Advanced Name Screening and Entity Linking Using large language models
- What Keeps the Public Safe While Avoiding Excessive Use of Incarceration? Supporting Data-Centered Decisionmaking in a DA’s Office
- Foundation Models for Brain Imaging
- Housing Price Forecasting – Alternative Approaches
- Evaluating the Capability of Large Language Models to Measure Psychiatric Functioning
- Predicting year-end success using deep neural network (DNN) architecture
Best Fall 2022 Capstone Posters
Best Fall 2022 Student Voted Posters
Best Fall 2022 Runner-Up Posters
Fall 2022 Capstone Project List
- Learning User Representations from Zillow Search Sessions using Transformer Architectures
- Neural Re-Ranking for Personalized Home Search
- Leveraging Computer Vision to Map Cell Tower Locations to Enhance School Connectivity
- Data Science for Clinical Decision-making Support in Radiation Therapy
- Using Voter File Data to Study Electoral Reform
- Creating an Epigenomic Map of the Heart
- Career Recommendation
- Galaxy Dataset Distillation
- Ego4d NLQ: Egocentric Visual Learning of Representations and Episodic Memory
- Methane Emission Quantification through Satellite Images
- Calibrating for Class Weights
- Assigning Locations to Detected Stops using LSTM
- Impact of YMCA Facilities on the Local Neighborhoods of Bronx
- Powering SMS Product Recommendations with Deep Learning
- Evaluation and Performance Comparison of Two Models in Classifying Cosmological Simulation Parameters
- Crypto Anomaly Detection
- Sequence Modeling for Query Understanding & Conversational Search
- Multi-Modal Graph Inductive Learning with CLIP Embeddings
- Deep Learning Framework for Segmentation of Medical Images
- Multimodal Contract Segmentation
- Dementia Detection from FLAIR MRI via Deep Learning
- Extraction of Causal Narratives from News Articles
- Detecting Erroneous Geospatial Data
- Improving Speech Recognition Performance using Synthetic Data
- Multi-document Summarization for News Events
- Multi-task learning in orthogonal low dimensional parameter manifolds
- Let’s Go Shopping: An Investigation Into a New Bimodal E-Commerce Dataset
- Training AI to recognize objects of interest to the blind community
- Classify Classroom Activities using Ambient Sound
- Solving challenging video games in human-like ways
- Database and Dashboard for RII
- Bitcoin Price Prediction Using Machine Learning Models
- Context Driven Approach to Detecting Cross-Platform Coordinated Influence Campaigns
- Invalid Traffic Detection Model Deployment
- Recalled Experiences of Death: Using Transformers to Understand Experiences and Themes
- Context-Based Content Extraction & Summarization from News Articles
- Neural Learning to Rank for Personalized Home Search
- Improve Speech Recognition Performance Using Unpaired Audio and Text
- Data Normalization & Generalization to Population Metrics
- Automated Judicial Case Briefing
- Cyber Threat Detection for News Articles
- MLS Fan Segmentation
- Near Real-Time Estimation of Beef and Dairy Feedlot Greenhouse Gas Emissions
- Do Better Batters Face Higher or Lower Quality Pitches?
Best Fall 2021 Capstone Posters
2021 Capstone Project List
- 3D Astrophysical Simulation with Transformer
- Accelerated Learning in the Context of Language Acquisition
- Analysis of Cardiac Signals on Patients with Atrial Fibrillation
- Applications of Neural Radiance Fields in Astronomy
- Automatic Detection of Alzheimer’s Disease with Multi-Modal Fusion of Clinical MRI Scans
- Automatic Transcription of Speech on SAYCam
- Automatic Volumetric Segmentation of Brain Tumor Using Deep Learning for Radiation Oncology
- Automatically Identify Applicants Who Require Physician’s Reports
- Building a Question-Answer Generation Pipeline for The New York Times
- Coupled Energy-Based Models and Normalizing Flows for Unsupervised Learning
- Data Classification Processing for Clinical Decision-making Support in Radiation Therapy
- Deep Active Learning for Protest Detection
- Estimating Intracranial Pressure Using OCT Scans of the Eyeball
- Graph Neural Networks for Electronic Health Record (EHR) Data
- Head and Neck CT Image Segmentation
- Head Movement Measurement During Structural MRI
- Image Segmentation for Vestibular Schwannoma
- Investigation into the Functionality of Key, Query, Value Sub-modules of a Transformer
- Know Your Worth: An Analysis of Job Salaries
- Machine learning-based computational phenotyping of electronic health records
- Modeling the Speed Accuracy Tradeoff in Decision-Making
- Multi-modal Breast Cancer Detection
- Multi-Modal Deep Learning with Medical Images and EHR Data
- Multimodal Representations for Document Understanding
- Nematode Counting
- News Clustering and Summarization
- Post-surgical resection mapping in epilepsy using CNNs
- Predicting Grandstanding in the Supreme Court through Speech
- Predicting Probability of Post-Colectomy Hospital Readmission
- Prediction of Total Knee Replacement Using Radiographs and Clinical Risk Factors
- Question Answering on Long Context
- Reinforcement Learning for Option Hedging
- Representation Learning Regarding RNA-RBP Binding
- Self-Supervised Learning of Medical Image Representations Using Radiology Reports
- The Study of American Public Policy with NLP
- Topical Aggregation and Timeline Extraction on the NYT Corpus
- Unsupervised Deep Denoiser for Electron-Microscope Data
- Using Deep Learning and FBSDEs to Solve Option Pricing and Trading Problems
- Vision Language Models for Real Estate Images and Descriptions
Featured 2020 Capstone Projects
2020 Capstone Project List
- 2D to 3D Video Generation for Surgery (Best Capstone Poster)
- Action Primitive Recognition with Sequence to Sequence Models towards Stroke Rehabilitation
- Applying Self-learning Methods on Histopathology Whole Slide Images
- Applying Transformers Models to Scanned Documents: An Application in Industry
- Beyond Bert-based Financial Sentimental Classification: Label Noise and Company Information
- Bias and Stability in Hiring Algorithms (Best Capstone Poster)
- Breast Cancer Detection using Self-supervised Learning Method
- Catastrophic Forgetting: An Extension of Current Approaches (Best Capstone Poster)
- ClinicalLongformer: Public Available Transformers Language Models for Long Clinical Sequences
- Complication Prediction of Bariatric Surgery
- Constraining Search Space for Hardware Configurations
- D4J: Data for Justice to Advance Transparency and Fairness
- Data-driven Diesel Insights
- Deep Learning to Study Pathophysiology in Dermatomyositis
- Detection Of Drug-Target Interactions Using BioNLP
- Determining RNA Alternative Splicing Patterns
- Developing a Data Ecosystem for Refugee Integration Insights
- Diarizing Legal Proceedings
- Estimating the Impact of the Home Health Value-Based Purchasing Model
- Extracting economic sentiment from mainstream media articles
- Food Trend Detection in Chinese Financial Market
- Forecasting Biodiesel Auction Prices
- Generative Adversarial Networks for Electron Microscope Image Denoising
- Graph Embedding for Question Answering over Knowledge Graphs
- Head and Neck CT Image Segmentation
- Impact of NYU Wasserman Resources on Students’ Career Outcomes
- Improving Accented Speech Recognition Through Multi-Accent Pre-Exposure
- Improving Synthetic Image Generation for Better Object Detection
- Learning-based Model for Super-resolution in Microscopy Imaging
- Modeling Human Reading by a Grapheme-to-Phoneme Neural Network
- Movement Classification of Macaque Neural Activity
- New OXXO Store in Brazil and Revenue Prediction
- Numerical Relativity Interpolations using Deep Learning
- One Medical Passport: Predictive Obstructive Sleep Apnea Analysis
- Online Student Pathways at New York University
- Predicting YouTube Trending Video Project
- Promotional Forecasting Model for Profit Optimization
- Question Answering on Tabular Data with NLP
- Raizen Fuel Demand Forecasting
- Reach for the stars: detecting astronomical transients
- Reverse Engineering the MOS 6502 Microprocessor
- Selecting Optimal Training Sets
- Synthesizing baseball data with event prediction pretraining
- Train ETA Estimation for Rumo S.A.
- Training a Generalizable End-to-End Speech-to-Intent Model
- Utilizing Machine Learning for Career Advancement and Professional Growth
Best Fall 2019 Capstone Projects
2019 Capstone Project List
- Adversarial Attacks Against Linear and Deep-learning Regressions in Astronomy
- Automated Breast Cancer Screening
- Automatic Legal Case Summaries
- Beer NLP
- Cross-task Transfer Between Language Understanding Tasks in NLP
- Dark Matter and Stellar Stream Detection using Deep Learned Clustering
- Exploiting Google Street View to Generate Global-scale Data Sets for Training Next Generation Cyber-Physical Systems
- Federated Incremental Learning
- Fraud Detection in Monetary Transactions Between Bank Accounts
- Guided Image Upsampling
- Improving State of the Art Cross-Lingual Word-Embeddings
- Inferring the Topic(s) of Wikipedia Articles
- Latent Semantic Topics Distribution Over Web Content Corpus
- Lease Renewal Probability Prediction
- Machine Learning for Adaptive Fuzzy String Matching
- Market Segmentation from Retailer Behavior
- Modeling the Experienced Dental Curriculum from Student Data
- Modelling NBA Games
- Movie Preference Prediction
- MRI Image Reconstruction
- NLP Metalearning
- Predict next sales office location
- Predicting Stock Market Movements using Public Sentiment Data & Sequential Deep Learning Models
- Predictive Maintenance Techniques
- Reinforcement Learning for Replication and Hedging of Option
- Self-supervised Machine Listening
- Sentence Classification of TripAdvisor ‘Points-of-Interest’ Reviews
- Simulating the Dark Matter Distribution of the Universe with Deep Learning
- SMaPP2: Joint Embedding of User-content and Network Structure to Enable a Common coordinate that captures ideology, geography and user topic spectrum.”
- Sparse Deconvolution Methods for Microscopy Imaging Data Analysis
- Stereotype and Unconscious Bias in Large Datasets
- Structuring Exploring and Exploiting NIH’s Clinical Trials Database
- The Analysis, Visualization, and Understanding of Big Urban Noise Data
- Unsupervised and Self-supervised Learning for Medical Notes
- Unsupervised Generative Video Dubbing
- Using Deep Generative Models to de-noise Noisy Astronomical Data
Featured Academic Capstone Projects
Featured Industry Capstone Projects
Predicting Stock Market Movements using Public Sentiment Data & Sequential Deep Learning Models
Sentence Classification of TripAdvisor ‘Points-of-Interest’ Reviews
Determining where New York Life Insurance should open its next sales office
NBA Shot Prediction with Spatio-Temporal Analysis
Other Past Capstone Projects
- Active Physical Inference via Reinforcement Learning
- Deep Multi-Modal Content-User Embeddings for Music Recommendation
- Fluorescent Microscopy Image Restoration
- Learning Visual Embeddings for Reinforcement Learning
- Offensive Speech Detection on Twitter
- Predicting Movement Primitives in Stroke Patients using IMU Sensors
- Recurrent Policy Gradients For Smooth Continuous Control
- The Quality-Quantity Tradeoff in Deep Learning
- Trend Modeling in Childhood Obesity Prediction
- Twitter Food/Activity Monitor