Kyunghyun Cho is a newly appointed, joint assistant professor at NYU’s Center for Data Science, and the Courant Institute for Mathematical Sciences. His work as a researcher focuses in the area of Natural Language Processing (NLP), and he will be teaching a course in the fall on the pragmatic approaches, and philosophical foundations of using neural nets in the field of NLP, Natural Language Understanding with Distributed Representations.
What drew you to the field of deep learning?
I studied computer science during my undergrad years in Korea. I wasn’t particularly keen on studying further, but one day in my last year of undergrad (2009), I picked up a brochure for a master’s program in machine learning and data mining.
When I started as a master’s student at Helsinki University of Technology (which is now a part of Aalto University) in Finland, I was assigned to a group mostly working on Bayesian probabilistic modeling, which in 2009 was still the sexier field, but I was asked to study deep learning, which at the time was a fairly recent development.
What aspect of deep learning particularly interested you?
I became interested in neural networks, which is the reason I continue studying deep learning to the present day.
How would you describe yourself in terms of your research?
I don’t find myself particularly an Natural Language Process researcher. I think of myself as a machine learning researcher, who primarily focuses on natural languages.
Why did you decide to focus your research towards Natural Languages?
There were two factors. First, Yoshua Bengio, my post-doc supervisor, immediately convinced me that natural language processing, especially machine translation, would be the next field/task revolutionized by deep learning, which had already revolutionized object and speech recognition.
Second, natural languages are fantastic for machine learning and data science. They have complex underlying structures without any compact, complete description. Linguists and philosophers have spent enormous amounts of time understanding human languages, but at best, we have some incomplete, often incorrect, knowledge of human languages. The one thing we do have is gigantic amounts of natural language text (think of how many documents are there in the Internet.) Now we can let the data/text speak for itself by developing appropriate machine learning models that are able to extract the underlying, complex structures of natural languages.
How could I have resisted this tempting opportunity?
Could you tell me about how you use neural nets in NLP?
Natural language processing has only recently fully embraced neural net based approaches. Neural networks present a unique opportunity for NLP. Instead of relying heavily on domain/linguistic knowledge, with neural networks, we now have a fully data-driven way to understand natural languages.
One important characteristics of the neural net based approaches to NLP is that almost no prior knowledge about natural languages is assumed. Instead of building an elaborate model/algorithm based on the domain knowledge, we build a large, nonlinear model with many parameters, and train the model on a large corpus using a generic learning algorithm. In other words, in these approaches, a human language is treated as if it’s just another modality of data (think of images, video or speech.)
How do neural nets factor into the course you’re teaching?
When you start two incorporate neural networks in NLP, two questions naturally follow. First, why do we want to do NLP in this way? Second, what can we learn from this? These two questions will be at the core of my course this semester.
Beside these core, somewhat philosophical questions, the course will have a strong practical side dealing with actual natural language data for solving real-world problems. We will take, data-driven approaches to natural languages to the extreme, where more or less no domain knowledge is required. This course will teach students how to directly tackle an actual task (think of translation) statistically rather than building a complicated, multistep system based on one’s prior knowledge about how it should be solved.
What drew you to CDS at NYU
I’ve been working on machine translation since late 2013, and I want to continue pushing forward in machine translation with neural networks. Now that I’m moving to CDS/CS at NYU, I am at the stage where I’m exploring a larger set of tasks and problems in natural language processing.
On a chalk-board at CDS we have the question: “What does it mean to be a data scientist?” What does this mean to you?
Data is a collection of observations of a certain phenomenon. More often than not, those phenomena have defied us by successfully hiding their secret from how we build our knowledge of the universe. Whenever I surf the web, I see tons of natural language text, but how those languages have come to be, I have no idea and neither do linguists. When the stock I owned became worthless over a few days, no one including myself could say exactly how it happened. Over generations, parents have looked at their babies growing up and learned intuitively, but nobody knows the exact learning mechanisms of babies.
But, one thing we’re sure about is some mechanisms exist behind all these phenomena. The only thing between these observations, and the underlying mechanisms is our inability to interpret these observations, and figure out how they’ve happened.
This missing information is where data scientists step in. Data scientists devise means to uncover the mechanism or knowledge underlying a phenomenon from a “large” collection of observations of that phenomenon. I believe this way of extracting knowledge from data will be at the core of future science.
Interview by Jack Lowery, Content Development Assistant at CDS