# Required Courses

DS-GA-1001: Introduction to Data Science

DS-GA-1001: Introduction to Data Science

Introduces students to basic software algorithms and software tools, teaches how to deal with data, representing data, and methodology. Provides hands-on experience using Torch, a software system being developed at NYU and other research centers that has a large data science library.

**Course aims and objectives:**

After taking this class, students should:

- Approach business problems data-analytically. Think carefully and systematically about whether & how data can improve a particular application, to understand a phenomenon better and especially to make better-informed decisions and automated decisions.
- Understand fundamental principles of data science, such as using data to get information about an unknown quantity of interest, calculating and using data similarity, fitting models to data, supervised and unsupervised modeling, overfitting and its avoidance, evaluation and model analytics, visualization, predictive modeling, causal inference, the data mining process, problem decomposition, data science strategy, solution deployment, and more.
- Be able to apply the most important data science methods, using open-source tools.

**Prerequisites:**

- Some experience in programming: Java, C, C++, Python, Perl, or similar languages, equivalent to two introductory courses in programming, such as “Introduction to Programming” and “Data Structures and Algorithms.”

**DS-GA-1002: Probability and Statistics for Data Science (formerly Statistical and Mathematical Methods)**

This course briefly introduces basic statistical and mathematical methods needed in the practices of data science.

**Prerequisites: **Calculus and linear algebra at the undergraduate level

**DS-GA-1003: Machine Learning and Computational Statistics**

The course covers a wide variety of topics in machine learning, pattern recognition, statistical modeling, and neural computation. It covers the mathematical methods and theoretical aspects, but primarily focuses on algorithmic and practical issues.

**Course aims and objectives:**

- Teach intermediate topics in machine learning
- Provide hands-on experience in designing and programming data science algorithms

**Prerequisites:**

- DS-GA-1001: Introduction to Data Science, or undergraduate course in Machine Learning.
- DS-GA-1002: Probability and Statistics for Data Science
- Some experience in programming: Java, C, C++, Python, R, Lua, Ruby, OCaml or similar languages, equivalent to two introductory courses in programming, such as “Introduction to Programming” and “Data Structures and Algorithms.”
- https://davidrosenberg.github.io/ml2017/

**DS-GA-1004: Big Data**

This course covers methods and tools for automatic knowledge extraction from very large datasets. Methods include on-line learning, feature hashing, class embedding, distributed databases, map-reduce framework, and applications.

**Prerequisites:**

- DS-GA-1001: Introduction to Data Science or equivalent undergraduate course
- DS-GA-1002: Probability and Statistics for Data Science

**DS-GA-1006: Capstone Project and Presentation in Data Science**

The purpose of the capstone project is to make the theoretical knowledge acquired by the students operational in realistic settings. During the project, students see through the entire process of solving a real-world problem: from collecting and processing real-world data, to designing the best method to solve the problem, and implementing a solution. The problems and datasets come from real-world settings identical to what the student would encounter in industry, government, or academic research. Students will work individually or in small groups on a problem that typically will come from industry and involve an industry-sourced dataset, but could also be provided by academic research groups inside or outside NYU. A list of such problems will be available early in the semester and students would select a problem aligned with their personal interests. Students with similar interests could form groups of 2 or 3. The selection of problems to work on and the formation of the groups will be approved by the course director. Each program team would be supervised by the course instructor and advised by a project advisor form the academic or industry group that originated the project.

**Course aims and objectives:**

- Students will demonstrate an ability to handle a problem in data science from the point of problem definition through delivery of a solution. In doing so, they will demonstrate proficiency in collecting and processing real-world data, in designing the best methods to solve the problem, and in implementing a solution.
- Students will demonstrate competence in presenting material by delivering two presentations: a proposal on how to approach the problem and their final solution.
- Students will learn how to work in small teams by working with at least one other student on their project.
- Students will write a report on their project for evaluation by the instructor in consultation with the project advisors. The report will be structured as a typical research paper, and hence will include three main sections: 1. motivation and problem definition, existing approaches to the problem; 2. proposed solution; 3. results, conclusion, and directions for future work.

**Prerequisites:**

- Successful completion of DS-GA-1001: Introduction to Data Science, DS-GA-1002: Probability and Statistics for Data Science, DS-GA-1003: Machine Learning, and DS-GA-1004: Big Data

**One Data Science Elective:**

Choose 1 from list below.

- DS-GA 1005 Inference and Representation
- DS-GA 1008 Deep Learning
- DS-GA 1011 Natural Language Processing with Representation Learning
- DS-GA 1012 Natural Language Understanding and Computational Semantics
- DS-GA 1013 Optimization-based Data Analysis
- Optimization and Computational Linear Algebra

# Track Courses

Each track will have course requirements. For more information, please visit the MS Curriculum page.

# Pre-Approved General Elective Courses

Below is a list of electives that are pre-approved for Data Science graduate students. Many students will wish to deepen their expertise in a discipline or domain. To that end, we recommend course sequences around typical disciplines and domains. Students wishing to take elective courses not on the list need to obtain approval. For approval, email the information below to Kathryn Angeles at kangeles@nyu.edu.

- Course information
- Course syllabus
- Relevance of course to MSDS degree and goals
- If available, course website

Note that some courses have prerequisites, which are not listed. You will need to consult the department website for such details. Often a search on “NYU” + the course name will lead to details for the course.

# Electives by Discipline or Domain

The courses below are organized around specific disciplines or domains. Be aware that the courses listed are not necessarily offered every semester.

**Applied Statistics (PRIISM/Steinhardt)**

- APSTA-GE 2004: Advanced Modeling I: Multivariate Analysis
- APSTA-GE 2012: Causal Inference: Statistical Methods For Program Evaluation and Policy Research
- APSTA-GE 2013: Missing Data
- APSTA-GE 2014: Adv Topics in Quant Meth: Statistical Analysis of Networks
- APSTA-GE 2015: Adv Topics in Quant Meth: Applied Spatial Statistics
- APSTA-GE 2016: Factor Scoring and Practical Issues in Scaling
- APSTA-GE 2040: Multilevel Models: Growth Curves
- APSTA-GE 2041: Practicum In Multi-Level Models
- APSTA-GE 2042: Multilevel Models: Nested-Data Models
- APSTA-GE 2094: Factor Analysis and Structural Equation Modeling
- APSTA-GE 2110: Applied Statistics: Using Large Databases in Education Research
- APSTA-GE 2122: Applied Statistical Modeling & Inference
- APSTA-GE 2134 Experimental and Quasi-Experimental Design
- APSTA-GE 2351 Probability: Theory and Practice
- APSTA-GE 2997: Advanced Methods in Health and Policy Research (Generalized Linear Models)
- APSTA-GE 2013001: Topics in Advanced Quant Method
- RESCH-GE 2139: Survey Research Methods I
- PUHE-GE 2306: Epidemiology

**Artificial Intelligence**

- Center for Data Science
- DS-GA-1008-001: Deep Learning

- Computer Science (Courant)
- CSCI-GA.2271-001: Computer Vision
- CSCI-GA.2560-001: Artificial Intelligence
- CSCI-GA.2965-001: Heuristic Problem Solving
- CSCI-GA.3033-009: Speech Recognition
- CSCI-GA.3033-001: Statistical Natural Language Processing
- CSCI-GA.2590-001: Natural Language Processing
- CSCI-GA-2566-001: Foundations of Machine Learning
- CSCI-GA.3033-004: Social Networks

**Biology and Bioinformatics**

- MATH-GA 2852.002/BIOL-GA 1131.001: Biophysical Modeling of Cells & Populations
- MATH-GA 2852.001/BIOL-GA 2852.001: Stochastic Problems in Cellular, Molecular and Neural Science
- MATH-GA.2851.001: Advanced Topics in Math Biology
- Tandon
- BI 7513: Chemical Foundation For Bioinformatics
- BI 7523: Biological Foundation For Bioinformatics
- BI 7543: Bioinformatics II: Protein Structure
- BI 7553: Bioinformatics III: Functional Prediction
- BI 7623: Systems Biology: -Omes and -Omics
- BI 7643: Computational Tools Perl & Bioperl
- BI 7633: Microarray Data Analysis
- BI 7653: Next Generation Sequence Analysis
- BI 7843: Molecular Modeling and Simulation
- BI 7533: Bioinformatics I: Sequence Analysis
- BI 7613: Introduction to Systems Biology

- BIOL-GA 1007: Programming for Biologists
- BIOL-GA.1009: Biological Databases and Data mining (4 Credits)
- BIOL-GA.1130: Applied Genomics: Introduction to Bioinformatics and Network Modeling (4 Credits)
- CSCI-GA.2520-001/BIOL-GA.1127: Bioinformatics & Genomes (4 Points)

**Biostatistics**

- EHSC-GA 2304: Advanced Topics in Biostatistics
- EHSC-GA 2330: Advanced Topics in Survival Analysis
- EHSC-GA 2331: Advanced Topics in Data Mining with Applications to Genomics
- EHSC-GA 2303: Introduction to Biostatistics
- EHSC-GA 2047: Introduction to Survival Analysis
- EHSC-GA 2306: Methods of Applied Statistics and Data Mining with Applications to Biology and Medicine
- EHSC-GA 2332: Methods for the Analysis of Longitudinal Data
- EHSC-GA 2045: Methods for Categorical Data Analysis in Health Sciences Research
- EHSC-GA2335: Sampling Methods and Applications in Health Surveys
- EHSC-GA2313: Statistical Problems in Medicine and Biology
- EHSC-GA 2333: Introduction to Measurement Error in Biomedical Research
- EHSC-GA 2334: Statistical Methods in Genetics and Genetic Epidemiology
- EHSC-GA 2335: Sampling Methods and Applications in Health Surveys
- EHSC-GA 2336: Introduction to Statistical Inference
- EHSC-GA 2337: Causal Inference in Observational Studies
- EHSC-GA 2338: Statistical Methods for Clinical and Translational Research

**Business**

- Stern
- ACCT-GB.3304: Modeling Finc Statements
- COR1-GB.2311: Foundations of Finance
- ECON-GB.2355: Behavioral Economics: Decisions and Strategies
- ECON-GB.3351.01: Econometrics I
- FINC-GB.2334: Financial Services Industry
- FINC-GB.3121: Topics In Hedge Fund Strategies
- FINC-GB.3332: Portfolio Management
- INFO-GB.2335: Programming in Python
- INFO-GB.2346: Dealing with Data
- INFO-GB.2350 Robo Advisors & Systematic Trading
- INFO-GB.3306: Data Visualization
- INFO-GB.3322: Design and Development of Web and Mobile Applications
- INFO-GB.3351: Risk Management Systems
- INFO-GB.3391: Research Sem: Data Science
- INFO-GB.5336: Intro to Data Science Business
- MGMT-GB.2159: Collaboration, Conflict, and Negotiation
- MGMT-GB.3321: Developing Managerial Skills
- MGMT-GB 3335: Foundations of Entrepreneurship
- MGMT-GB.3337: Foundations of Technology Entrepreneurship
- MGMT-GB.3351: Game Theory
- MKTG-GB.2354: Data Driven Decision Making
- OPMG-GB.2350: Decision Models
- OPMG-GB.2351.30: Advanced Decision Models
- STAT-GB.2301: Regression and Multivariate Data Analysis
- STAT-GB.2302: Forecasting Time Series Data
- STAT-GB.3127: Statistical Aspects of Market Risk
- STAT-GB.3302.0: Statistical Inference and Regression Analysis
- STAT-GB.2301: Regression and Multivariate Data Analysis
- STAT-GB.2302: Forecasting Time Series Data
- STAT-GB.2308: Applied Stochastic Processes For Financial Models
- STAT-GB.2309: Mathematics of Investment
- STAT-GB.3304: Advanced Theory of Statistics
- STAT-GB.3305: Bayesian Inference and Statistical Decision Theory
- STAT-GB.3306: Time Series Analysis
- STAT-GB.3308: Sampling Techniques
- STAT-GB.3309: Experimental Design
- STAT-GB.3314: Statistical Computing and Sampling Methods With Applications to Finance
- STAT-GB.3383: Frequency Domain Time Series Analysis
- STAT-GB 4310: Statistics for Social Data

- School of Professional Studies
- HRCM1-GC 1210: Quantitative Methods and Metrics for Decision Making

**Center for Data Science (CDS)**

- DS-GA 1007: Programming for Data Science
- DS-GA 1008: Deep Learning
- DS-GA 1009: Practical Training for Data Science
- DS-GA 1010: Independent Study in Data Science
- DS-GA 3001: Special Topics: Advanced Programming for Data Science with C++
- DS-GA 3001: Special Topics: Advanced Python for Data Science
- DS-GA 3001: Special Topics: Computational Approaches to Natural Language Processing
- DS-GA 3001: Special Topics: Introduction to Causal Inference for Data Scientist
- DS-GA 3001: Special Topics: Law and Ethics for Data Managers
- DS-GA 3001: Special Topics: NLP with Representation Learning
- DS-GA 3001: Special Topics: Optimization and Computational Linear Algebra for Data Science
- DS-GA 3001: Optimization-Based Data Analysis
- DS-GA 3001: Special Topics: Text as Data
- DS-GA 3001: Special Topics: Topological Data Analysis and Graph Signal

**Center for Urban Science and Progress (CUSP)**

- CUSP-GX.5002: Urban ICT and City Operations
- CUSP-GX.5001: Foundations of Urban Science
- CUSP-GX.5003: Principles of Urban Informatics

**Computer Science**

- Courant
- CSCI-GA.1170-001: Fundamental Algorithms
- CSCI-GA.2110-001: Programming Languages
- CSCI-GA.2433-001: Database Systems
- CSCI-GA.2440-001: Software Engineering
- CSCI-GA.2620-001: Networks and Distributed Systems
- CSCI-GA.3033: Special Topics Comp Sci – Advanced Machine Learning
- CSCI-GA.3033: Special Topics Comp Sci – Geometric Modeling
- CSCI-GA.3033: Special Topics Comp Sci – Big Data Application Development
- CSCI-GA.3033-005: Production Quality Software
- CSCI-GA.3033-005: Distributed System
- CSCI-GA 3033-010: Transformation to Cloud Computing

**Economics**

- ECON-G31.1003: Microeconomic Theory
- ECON-G31.1005: Macroeconomic Theory I
- ECON-G31.1101: Applied Statistics and Econometrics I
- ECON-G31.1102: Applied Statistics and Econometrics II
- ECON-G3351.01: Econometrics I
- ECON-GA 1001: Math for Economists (MA)
- Stern
- ECON-GB.2333: Monetary, Policy, Banks, and Central Banks

**Large-Scale Computation**

- Computer Science (Courant)
- CSCI-GA.1180: Mathematical Techniques for Cs
- CSCI-GA.2112-001 & MATH-GA.2043.001: Scientific Computing
- CSCI-GA.2270: Computer Graphics
- CSCI-GA.2271: Computer Vision
- CSCI-GA.2434-001: Advanced Database Systems
- CSCI-GA.2580-001: Web Search Engines
- CSCI-GA.2945: Tpcs in Numerical Analy
- CSCI-GA.2945-001: High Performance Scientific Computing
- CSCI-GA.3033-011: Cloud Computing: Concepts And Practice
- CSCI-GA.3033-012: Multicore Processors: Architecture & Programming
- CSCI-GA.3033-008: Special Topics: Realtime and Big Data Analytics
- CSCI-GA.3033-012: Cloud Computing
- CSCI-GA.3210: Intro to Cryptography
- CSCI-GA.3813: Advanced Lab:

- Tandon
- CS 6913 Web Search Engines
- CS-GY.6313: Information Visualization

**Mathematical Finance**

- MATH-GA 2045.001: Computational Methods For Finance
- MATH-GA 2041.001: Computing in Finance
- MATH-GA 2706.001: Partial Differential Equations For Finance
- MATH-GA 2707.001: Time Series Analysis & Statistical Arbitrage
- MATH-GA 2708.001: Algorithmic Trading & Quantitative Strategies
- MATH-GA 2709.001: Financial Engineering Models For Corporate Finance
- MATH-GA 2751.001: Risk & Portfolio Management w/Econometrics
- MATH-GA 2752.001: Active Portfolio Management
- MATH-GA 2753.001: Advanced Risk Management
- MATH-GA 2791.001: Derivative Securities
- MATH-GA 2792.001: Continuous Time Finance
- MATH-GA 2796.001: Mortgage-Backed Securities and Energy Derivatives
- MATH-GA 2797.001: Credit Markets & Models
- MATH-GA 2798.001: Interest Rate & Fx Models

**Mathematical Foundations**

- Computer Science (Courant)
- CSCI-GA-2420-001/MATH.GA.2010.001: Numerical Methods I
- CSCI-GA-2566-001: Foundations of Machine Learning

- MATH-GA.2011.001: Advanced Topics in Numerical Analysis
- MATH-GA 2020.001/CSCI-GA 2421.001: Numerical Methods II
- MATH-GA.2046.001: Advanced Econometric Modeling
- MATH-GA.2048.001: Scientific Computing in Finance
- MATH-GA.2110.001: Linear Algebra I
- MATH-GA.2111.001: Linear Algebra (One-Term)
- MATH-GA.2170.001: Intro to Cryptography
- MATH-GA.2430.001: Real Variables I
- MATH-GA 2563.001: Harmonic Analysis
- MATH-GA.2701.001: Methods of Applied Mathematics
- MATH-GA.2757.001: Regulation and Regulatory Risk
- MATH-GA.2830: Advanced Topics in Applied Math – Mathematics of Data Science
- MATH-GA.2901.001: Basic Probability
- MATH-GA 2902.001: Stochastic Calculus
- MATH-GA 2911.001: Probability: Limit Theorems I
- MATH-GA 2912.001: Probability: Limit Theorems II
- MATH-GA.2931.001: Advanced Topics in Probability
- MATH-GA.3771.001: Independent Study
- MATH-GA.3775.001: Advanced Practical Training

**Music**

- MPATE-GE 2623: Music Information Retrieval
- MPATE-GE2599001: Fundamentals of Digital Signal

**Neuroscience**

- NEURL-GA.3042.: Mathematical Aspects of Neurophysiology
- NEURL-GA.3235.: Information Processing and Visual Pathways
- BMSC-GA 2440: Emerging Diseases and Bioterrorism: Disease Surveillance Epidemiology
- BMSC-GA 2604: Bioinformatics (4 Credits)

**Physics**

- PHYS-GA 2000: Computational Physics

**Political Science**

- POL-GA.2106: Methods of Political and Social Analysis (4 Credits)
- POL-GA.1120: Introduction to Quantitative Political Analysis I (4 Credits)
- POL-GA.2105: Formal Modeling in Political Science (4 Credits)
- POL-GA.2108: Game Theory and Politics
- POL-GA.2127: Introduction to Quantitative Political Analysis II

**Psychology**

- APSY-GE 2140: Measurement: Classical Test Theory
- APSY-GE 2141: Measurement: Modern Test Theory
- APSY-GE 2142: Measurement And Evaluation: Psychometric Theory
- APSY-GE 2143: Construction of Psychological Tests
- APSY-GE 2074: Research Design and Methodology in the Behavioral Sciences II
- PSYCH-GA.2067-001: Applied Research Methods
- PSYCH-GA.2217 -001: Research Methods Is S/P
- PSYCH-GA.2226-001: Psycholinguistics
- PSYCH-GA.2248-001: Analysis of Change
- PSYCH-GA.3393-001: Seminar in Neurolinguistics
- PSYCH-GA.3405-001: Causal Learning
- PSYCH-GA.3405-001: Neuroeconomics and Decision Making
- PSYCH-GA.3405: Bayesian Modeling of Behavior

**Sociology**

- SOC-GA.2330: Introduction to Methods of Sociological Research (4 Points)
- Design of Social Research (In the AQR Program)
- Techniques of Quantitative Analysis I (In the AQR Program)
- Techniques of Quantitative Analysis II (In the AQR Program)

**Statistical and Mathematical Methods**

- CSCI-GA.2112-001 & MATH-GA.2043.001: Scientific Computing
- CSCI-GA-2420-001/MATH.GA.2010.001: Numerical Methods I
- CSCI-GA.2945-001: High Performance Scientific Computing
- CSCI-GA.2945-001 & MATH-GA.2620.002: Monte Carlo Methods
- CSCI-GA.2945-002 & MATH-GA-2011.002: Numerical Optimization
- CSCI-GA.2965-001: Heuristic Problem Solving
- MATH-GA.2901.001: Basic Probability
- MATH-GA 2020.001/CSCI-GA 2421.001: Numerical Methods II
- MATH-GA.2011.003: Analytical Methods in Computer Science
- MATH-GA 2701.001: Methods of Applied Math
- MATH-GA 2704.001: Applied Stochastic Analysis
- MATH-GA 2840.002: Data Analysis Methods For High-Dimensional Time Series
- MATH-GA 2902.001: Stochastic Calculus
- MATH-GA.2962.001: Mathematical Statistics

**Tandon School of Engineering**

- CS-GY 5403: Data Structures and Algorithms
- CS-GY 6033: Design and Analysis of Algorithms I
- CS-GY 6043: Design and Analysis of Algorithms II
- CS-GY 6083: Principles of Database Systems
- CS-GY 6093: Advanced Database Systems
- CS-GY 6313: Information Visualization
- CS-GY 6323: Large-Scale Visual Analytics
- CS-GY 6613: Artificial Intelligence I
- CS-GY 6643: Computer Vision and Scene Analysis
- CS-GY 6673: Neural Network Computing
- CS-GY 6913: Web Search Technology

**Wagner (Public Service)**

- EXEC-GP 4119 Data Visualization and Presentation
- PADM-GP 2875: Estimating Impacts in Policy Research
- PADM-GP 4114: Surveys and Interviews: A Laboratory on Techniques of Sampling, Designing, Conducting, and Analyzing Surveys and Interviews