CDS Fellow Interview: Daniel Fernández - NYU Center for Data Science

Daniel Fernández is a new Post-Doctoral Fellow at the Center for Data Science, and his research focuses on the ways in which data science can be used to help individuals with speech impediments.

Can you talk about the research projects you’re currently working on at CDS?

I am mainly working on a project that focuses on communication disorders, such as speech impediments. These disorders can affect a person’s ability to participate in occupational, social, and academic settings, and affect up to 10% of the total population.

Can you talk about why you wanted to pursue this project at CDS specifically?

CDS gathers researchers from a wide range of fields, which creates an enriching network, and a productive environment. I was particularly interested in the interaction between statisticians, computer scientists and professionals from other fields, because I believe this is the sort of interaction necessary for research in the field of data science.

What drew you to work on a project focusing on speech impediments?

I have been always interested in research topics concerning the social integration of underrepresented minorities. I completed my doctoral work at Victoria University of Wellington in New Zealand, and while I was there, I worked at the University’s Disability Service’s Department, which helped students with social communication disorders or language-based learning disabilities. My work there certainly reinforced my interest in solving these sorts of problems.

How are you going about collecting data regarding speech disorders?

We are using crowdsourced experiments to obtain data in the study of speech-rate tasks. An example of a speech-rate task would be a child saying a word which contains an “r” sound. This word is uploaded into the crowsourced experiment, and a group of non-expert listeners rate whether this particular child has produced the sound of the “r” correctly or incorrectly. Crowdsourced studies help us collect data that rate the degree of a person’s speech disability, so that we can better understand how widespread certain speech disorders are, and how noticeable the disorder is.

What are some of the advantages of crowdsourcing data, as opposed to other collection methods?

In the context of measuring speech production, traditional methods require either highly trained personnel or a large numbers of human listeners in a laboratory setting, making the process slow, and expensive. This poses a major rate-limiting factor in the study of speech production. Crowdsourcing represents a valid way to measure speech quality and intelligibility, where data can be collected quickly, cheaply, and efficiently.

Have you run into any difficulties with crowdsourcing?

Crowdsourcing provides a way to collect data quickly, but it is harder to control the quality of the data. Because of this, we are developing statistical methods to evaluate rater and speech-task quality in these experiments. We are also investigating the optimal number of raters and speech tasks required to obtain reliable and robust estimates.

I am also working on a somewhat related project to determine which factors impact how speech pathology experts are hearing sounds in people—mainly children—with speech disabilities. The purpose of this experiment is to explain the variations in how experts rate those speech sounds.