Michael Blanton is an astrophysicist using data science to study galaxy evolution and map the Universe. He is the director of the Sloan Digital Sky Survey, an Associate Professor in NYU’s physics department, and an Affiliated Professor at the Center for Data Science.
What initially drew you to the field of astrophysics?
I studied engineering physics in college, and became interested in astrophysics through a planetary astronomy course I took as a sophomore. What drew me to the field was how a modicum of physical knowledge, or even a small amount of data, could be used to infer the conditions in distant and otherwise unknowable places.
How did you start to incorporate data science into your research?
Even before I knew about the term “data science” my research required considerable work in data management, provenance tracking, and the use of statistical inference. Astronomy involves large complex data sets, and to understand them you need to account for the particular environmental conditions during the observations.
Can you talk about your goals for the Sloan Digital Sky Survey, and how data science plays into this?
The three main goals are:
- Studying the history of the Milky Way
- Mapping nearby galaxies
- Learning about the Universe’s expansion in its infancy
With galaxies like the Milky Way, we’re using data science to study the formation of its stars, and their elemental abundances. Whereas to study the expansion of the Universe, we’re mapping quasars to trace density fields.
All of this data is public information you can find at: skyserver.sdss.org.
How are improvements in data visualization impacting the way we visualize galaxies and groups of stars?
Most of the innovations aren’t in visualizing the actual galaxies, but in visualizing the data we collect about galaxies in relation to each-other. We have an enormous amount of information, which can be translated into characteristics such as: stellar density, formation rate, chemical make-up, photoionization states, and planetary movements. Visualizations of these characteristics allow us to understand the correlations and connections in all this data. The challenge is giving astronomers tools to make useful visualizations that reveal meaningful insights.
How has data science’s role in the field of astrophysics changed over the years?
We’ve moved away from bespoke products that were made for specific purposes, and now rely on standard computer science and data management tools. Just like hardware, the existence of a common tool base makes things far easier to develop and to maintain.
Are there any ways data science could be used more effectively in the field of astronomy?
One of my soap boxes is that a lot of science research is based on data sets that have already been calibrated. I think the major opportunity is in creating these calibrations by improving the survey operations and analyzing raw data.
In your opinion, what is the single biggest affect data science has had on the field of astrophysics, or how we approach the field of physics in general?
I think the biggest effect has been sociological. When I started working in the field of astronomy, there was a sense that computing and data management skills would have some sort of practical application in the future, but nobody was really sure how it would happen.
The existence of data science as a field gives a sense of value in developing a particular set of data skills. Technical expertise is never the same as scientific expertise, and there is an increased sense of the importance in understanding data’s technical aspects.
What drew you to the CDS program at NYU?
CDS gathers scientists from a wide range of fields and allows us to start conversations from a shared technical perspective. Also, I can finally learn about the huge range of science research being conducted at NYU!
On a chalk-board at CDS we have the question: “What does it mean to be a data scientist?” Could you answer this question?
A data scientist is somebody who implements solutions in data management, distribution, or analysis that are applicable to their own domain, but also extensible to other fields.
Interview by Jack Lowery