This is the first article in a series profiling NYU Center for Data Science professors, exploring the origins of their interest in data science and their thoughts on the Moore-Sloan Initiative.
Kyle Cranmer, NYU Associate Professor of Physics and Affiliated Faculty member at the Center for Data Science, is an experimental particle physicist working on the Large Hadron Collider at CERN, based in Geneva, Switzerland. He developed a collaborative statistical modeling approach and statistical methodology that was used extensively for the discovery of the Higgs boson at the LHC in July, 2012.
Cranmer obtained his Ph.D. in Physics from the University of Wisconsin and his B.A. in Mathematics and Physics from Rice University. He has been a Goldhaber Fellow at Brookhaven National Lab, as well as a recipient of the Presidential Early Career Award for Scientists and Engineers (awarded by President George W. Bush) and the National Science Foundation‘s Career Award.
How did your interest in physics come about?
For my last two years of high school, I attended the Arkansas School for Mathematics, Sciences, and the Arts, founded by Bill and Hillary Clinton in Hot Springs. You don’t think of Arkansas is a leader in education, but there I was in this super-unique, residential, public high school with a bunch of people who were awesome math and science nerds. The teachers were incredible – I had a Russian PhD physicist.
The year I started was the school’s first year, so everything was completely disorganized. We really had to find our way, and ended up teaching each other a lot of things. The biggest benefit was the exposure. I was interested in physics but knew nothing about computer science and electronics, and there were people there who were really, really good at math. That helped me chart where I wanted to go.
Then I went to Rice University. I was interested in everything from pure mathematics to electrical engineering, as well as physics and neuroscience. Both of my parents were scientist types – my mother did neuroscience – so I came from a very nurturing environment for that kind of thing. Early on, I thought I would go in the physics direction because I definitely like asking why, which points to fundamental physics because, in some sense, it’s the most basic and foundational of the sciences.
What role did you play in the discovery of the Higgs boson?
With the Large Hadron Collider, we’re trying to test various theories about the basic building blocks of the universe, the conditions of the universe shortly after the Big Bang. The big news was that last summer we discovered this particle called the Higgs boson –the “God particle.” That won the Nobel Prize. I had various roles in that but the biggest one was that I developed the statistical framework and methodology that both of the two big experiments ended up using. In terms of that statistical software, I’m generally recognized for being one of the people that founded that project.
Can you describe your current work with the ATLAS Experiment?
The experiment I’m working on now, the ATLAS Experiment, is a particle physics experiment at the Large Hadron Collider at CERN in Geneva [the European Organization for Nuclear Research and the birthplace of the World Wide Web]. It’s the biggest experiment in the world – the biggest in the history of science – involving 3000 physicists from 38 countries. It took 20 years, and billions of dollars, to build.
With something that big and complicated, obviously the field is specialized to an incredible degree. You have every type of specialist on the experimental side, as well as the theorists. A gap has developed between them, which is unfortunate, so most of my work has been trying to bridge that gap. I confront the theories with data, in the language of statistics.
With ATLAS, we take particles and smash them into each other, then take a digital three-dimensional photo of what’s going on, with a camera the size of an 11-story building. It’s a very big, complicated device that is taking 40 million photos per second, with hundreds of millions of electronic readouts per collision.
We end up throwing away almost all of the data – 99.9999% of the collisions – and still what we keep is by far the biggest data set in the history of science.
What’s the relationship between ATLAS and data science?
ATLAS is definitely a data science problem because we have the biggest data sets that have ever existed. It’s a huge international collaboration, we use a lot of very careful statistics, and we are testing very precise theories.
The area that I’ve been focusing on is looking for signs of the Higgs particle in our data. It’s not just one sign, though, because the particle can decay in lots of different ways, meaning that there are lots of different ways it can show up in the data. Each one of, say, 20 different signatures has a group of 50 or 100 people working on it, distributed all around the globe. All of these pieces need to be brought together into one cohesive and consistent whole from which you will try to infer some knowledge. To accomplish this, several of us developed a technology called collaborative statistical modeling, which is a theme that can be extracted from higher-energy physics and used more broadly.
Another data science area I’m working on is trying to make data openly accessible. Together with Juliana Freire, I’m leading NYU’s effort on reproducibility and open science. There’s a lot of data that either you can’t make public because it is considered proprietary or due to privacy issues. So we’re trying to come up with a model which allows limited access to data, through some restricted interface, so that someone from the outside could ask certain questions from the data but wouldn’t actually have access to it.
What’s your vision for the Moore-Sloan Initiative, and how might it change NYU?
I was super excited when NYU was chosen. We have a lot of people here doing very interesting work with tools we associate with data science, but that work primarily has a scientific focus. I count myself among those. I’m definitely a physicist but I’m using very advanced statistical machinery.
Regarding how I see the Moore-Sloan award changing NYU: What we’re trying to figure out is, how in the university setting do you support these very interdisciplinary people, as well as people who are doing incredibly valuable things but who traditionally might have been considered tool developers – which traditionally has not been very highly appreciated in an academic setting. Everyone who’s doing data science work now sees these people as absolutely critical. This is a very important problem, and the Moore-Sloan award basically gives enough impetus to allow NYU to really start to change that thinking.
It’s a great time to be here. I’m super lucky in terms of timing.
By ML Ball