Abstract: A fundamental task in the analysis of single-cell RNA-sequencing data is cell type annotation. Current cell type annotation methods rely on either prior biological knowledge, which is not always available in a comprehensive and consistent manner, or well-annotated reference datasets, which are not always complete, i.e., not all cell types present in the target are represented in the reference data. We propose conducting cell type annotation in a probabilistic framework to fill the gap incomplete references create.
As a way to connect new results with existing knowledge coming from flow cytometry from the past 50 years, we have established a ground truth based on the ADT portion of CITE-seq data. Furthermore, we have applied different probabilistic single-cell classification methods on the RNA portion of the data and compared the inferred cell-type labels among them. Using only one reference dataset, we present this work as a proof of concept that needs to be extended to establish a conclusive method comparison. While none of the methods has achieved a perfect match to the ground truth, probabilistic cell type annotation methods have proven to be promising; and future efforts should be dedicated to further investigate and improve this type of methods.