Modeling and integration of single-cell sequencing data
New technologies for single-cell sequencing are giving rise to datasets that will define a ‘parts list’ of cell types for human beings, yet are characterized by extensive noise and sparse measurements. I will discuss two statistical learning methods for the analysis and interpretation of these data. We propose a regularization procedure to fit models of technical variation that can be applied to generalized linear models, factor analysis, or nonlinear autoencoders. In addition, we develop a procedure based on diagonalized canonical correlation analysis to identify correspondences across different experiments, enabling us to ‘align’ and compare datasets from different laboratories, technologies, and species.