Faculty Profile: Vasant Dhar - NYU Center for Data Science

Usually in our interviews we like to ask our faculty what data science means to them, or, what does it mean to be a data scientist. Well, we already know what Vasant Dhar thinks on the subject, as he is the one who defined the field of Data Science himself. You can find his definition here, and a few questions we asked him about his work below.

What did you study in school? How did you get to what you study now?

I studied Chemical Engineering at the Indian Institute of Technology, Delhi before doing my graduate work at the University of Pittsburgh, where I did my thesis work in Artificial Intelligence back in the 1980’s, when the field was mostly concerned with knowledge representation, search, and “reasoning.”

How did you wind up getting into machine learning?

I got into machine learning in 1990 thanks to a project with Nielsen. I was exploring genetic algorithms for rule discovery and one of my first discoveries was that “older women in the northeast did a lot of shopping on Thursdays!” This turned out to be the coupon day back then so there was a simple explanation for what I had found, but I was fascinated that my algorithm had managed to ask an interesting question during its exploration, namely, “what possible reason could there be for this surge in spending on Thursdays?” All my subsequent research had this flavor to it, which I described as “patterns emerge before reasons for them become apparent.

When did you start to incorporate data science into your research?

I started getting into “big data” in 1991, after my Nielsen project. At that time, banks and telecomm companies were the ones with all the data, so I took a few years off and went to work on Wall Street, where I developed machine learning based approaches to market prediction, customer behavior, and risk management.

Could you tell me about some of the projects you’re currently working on?

My longest standing project has been building a robot that uses data to develop a trading strategy, and then executes. Financial markets are about as close to random as you can get, so designing good predictive systems is a real challenge. When you’re looking at profits and losses every day, it forces you to keep the science rigorous, simple, and honest. In finance, good science doesn’t assure success, but bad science does assure failure.

What do you think is the single biggest affect that data science has had on the field of predictive analytics?

I think it’s more that prediction has had a bigger affect on data science, not the other way around. Data Science and Prediction go hand in hand, as prediction is a key epistemic criterion that should be used in data science to assess whether something should count as knowledge.

How has the way in which you use data science changed over the years?

My questions have broadened as I integrate my findings from across the above areas, namely, finance, healthcare, sports, and education. Working in multiple domains over the last 20 years has given me a good appreciation of the noise levels in these respective areas and how it impacts the properties of the predictive system that one is trying to build. This comparative view has enabled me to ask a very interesting question that is central to data science, namely, “when should we trust robots with decisions?”

See: https://online.liebertpub.com/doi/full/10.1089/big.2015.28999.vda