An NRT-sponsored program in Data Science
NRT FUTURE Program
NRT FUTURE Program
Sponsored by a National Science Foundation Research Traineeship (NRT) grant, the FUTURE program takes an innovative and truly transformative approach to train PhD students in the emergent field of Data Science.
Rapid advances in computational speed and data availability, and the development of novel data analysis methods, have birthed a new field: Data Science. This new field requires a new type of researcher and actor: the rigorously trained, cross-disciplinary, and ethically responsible data scientist. Yet, graduate training programs have not kept pace with the unique challenges presented by our “data-rich” era. Traditionally, students are trained in a particular domain of natural science, engineering, or social science, and learn a smattering of statistical techniques that are applied to their particular problem. Often little attention is paid to the ethical implications of the work. Moreover, students have little training in spotting similarities in problems across disciplines and in applying their expertise across domains. Producing a researcher who is a “data science native” requires the development of a native environment for their education and training. The National Science Foundation Research Traineeship (NRT) award to the Center for Data Science (CDS) at New York University (NYU) has been used to build such an environment, and it has been named FUTURE.
CDS’s PhD program creates a sustainable traineeship in the area of methodological responsible data science, addressing the needs of research, industry, and government. The program fills a significant gap, rigorously training data scientists of the future who (1) develop methodology and harness statistical tools to find answers to questions that transcend the boundaries of traditional academic disciplines; (2) clearly communicate to extract crisp questions from big, heterogeneous, uncertain data; (3) effectively translate fundamental research insights into data science practice in the sciences, medicine, industry, or government; and (4) are aware of the ethical implications of their work, and practice responsibility-by-design.
These objectives will be achieved by a combination of an innovative core curriculum, a novel data assistantship mechanism that provides training of skills transfer through rotations, internships, and unique individual research experiences with a multitude of CDS internal such as NYU School of Medicine and external partners such as Facebook and Flatiron Institute, and communication and entrepreneurship modules. Traineeship will enable transformative new discoveries in domain sciences and advance research in translational and responsible data science.
People
Investigators
Principle Investigator: Julia Kempe
Co-Principal Investigators: Kyunghyun Cho, Carlos Fernandez-Granda, Brenden Lake, Cristina Savin, Daniel Sodickson, Arthur Spirling, and Julia Stoyanovich
Senior Personnel: Richard Bonneau, Sam Bowman, Joan Bruna, Rob Fergus, Juliana Freire, Yann LeCun, Tal Linzen, Brian McFee, Jonathan Niles-Weed, Rajesh Ranganath, Claudio Silva, and Andrew Gordon Wilson
Senior Personnel from the NYU School of Medicine: Silvia Curado, David Fenyő, Krzysztof Geras, Leora Horwitz, Florian Knoll, Riccardo Lattanzi, Yvonne Lui, Narges Razavian, and Aristotelis Tsirigos
Evaluator: Denis Gray
Program Coordinator: Tina Lam, Assistant Director of Graduate Programs
Partners
Dafna Bar-Sagi, Senior Vice President and Vice Dean for Science at NYU Langone School of Medicine on behalf of the School of Medicine
Leslie Greengard, Director, Center for Computational Mathematics
Shirley Ho, Group leader, Cosmology X Data Science Group, Center for Computational Astrophysics, on behalf of the Flatiron Institute of the Simons Foundation
Daniel Lee, Executive Vice President at Samsung Research and Tisch University Professor in Electrical and Computer Engineering at Cornell Tech, on behalf of Samsung Research
Jason Schultz, Professor of Clinical Law, Director of NYU’s Technology Law Policy Clinic
Rebecca Silver, Associate Director of the NYU Entrepreneurial Institute
Stefaan Verhulst, Co-Founder and Chief Research and Development Officer, on behalf of the Governance Laboratory (GovLab) at NYU
Highlights of NRT FUTURE Traineeship
Core Requirements
FUTURE trainees will be required to take five core courses: Introduction to Data Science (DS-GA 1001); Probability and Statistics for Data Science (DS-GA 1002); Machine Learning (DS-GA 1003); Big Data (DS-GA 1004); Inference and Representation (DS-GA 1005). In addition, Responsible Data Science (DS-GA 1017) is currently a highly recommended course. FUTURE trainees must successfully complete these requirements by the end of their third semester or show evidence that they have taken equivalent coursework elsewhere.
Data Assistantships (DAs)
A cornerstone of the FUTURE traineeship is a novel Data Assistantship (DA) mechanism that provides training of skills transfer through rotations, internships, and unique individual research experiences with our multitude of partners in NYU’s schools including the School of Medicine, the ambient buzzing industry (e.g., Facebook, Samsung), the startup environment (NYU Entrepreneurship), local government, ethics think-tanks (GovLab), and foundations (Flatiron Institute). To complete a DA, students take DS-GA 2001 Research Rotation, and are required to take 18 credits of this course by the end of their 6th semester.
The primary objective of a DA is to provide translational experience to FUTURE trainees. This involves (1) translation-to: gaining an understanding of the context of the target application domain, and of the scientific objectives of the work; (2) translation-from: evaluating the appropriateness of fundamental data science methods, and of the trade-offs between these methods, to achieve translational goals; (3) communication: communicating fundamental data science concepts to domain scientists, practitioners, and the general public; and (4) responsibility: being mindful of reproducibility, ethics, and legal compliance, and actively surfacing these dimensions in all stages of project design, implementation, and deployment.
Communication Workshop
Success of FUTURE trainees in translational data science — in catalyzing innovation and delivering impact to domains other than data science itself — critically depends on their ability to communicate effectively with a range of audiences. To support this objective, CDS will provide a structured 1-credit workshop on critical communication skills, including written and verbal communication. Students are strongly encouraged to participate in this workshop.
Entrepreneurship Workshop
Because research commercialization provides another important avenue for translation, and because some FUTURE trainees may elect to work in industry upon graduation, and to launch their own start-ups, we will work with the NYU Entrepreneurial Institute to develop an optional workshop based on their very popular course The Lean Launchpad: How to Build a Scalable Startup. This workshop will provide a real world, hands-on learning on what it’s like to actually start a high-tech venture. FUTURE trainees will learn to use the Business Model Canvas (BMC) as a tool to succinctly represent the key business components of their startup vision. They will also learn Customer Development, the process used for designing and evaluating their vision.