Machine Learning
Aggregating nuclei segmentation methods to quantify cell counts in spatial transcriptomics data

Oscar Otero Laudouar

Benoit Samson

Philippe Bertolino

Olivier Gandrillon

Franck Picard

Ghislain Durif


August 31, 2023

Spatial transcriptomics technologies allow the characterization of spatial variations regarding gene expression within a tissue. Sequencing-based approaches, such as 10x Genomics Visium, implement micro-bulk RNAseq-like measurements in thousands of spatially located spots across a 2-dimensional histological tissue section. These spots are small enough to only contain a few cells, which does not allow single-cell expression quantification but still gives an insight regarding spatial organization within a tissue and variability between cells at a refined scale. Visium data are complex to normalize and analyze since, amongst other specificities, the number of cells in each spot is unknown and varies across the tissue. In this context, estimating the cell density across the section (i.e. the number of sampled cells in each spot) could help to refine data preprocessing, such as normalization, which could contribute to enhance the accuracy of downstream deconvolution analyses, like cell type composition inference. Additionally, cell density across the tissue is also a biologically meaningful information to study. To tackle this question, we have used state-of-the-art machine-learning-based cellular segmentation methods, namely Stardist and Cellpose, to automatically count cells in each spot, using histological section microscopy images that are provided with Visium datasets. Their performance was assessed across a range of hyperparameter values, image resolutions, and tissue samples. In particular, we focused on pituitary adenoma samples from unrelated patients. In addition, we have used an ensemble learning approach to aggregate the cell count predictions provided by the segmentation methods across different hyperparameter values. In the absence of ground truth to validate our results, we have found that the aggregated predictions are consistent with human interpretation of the images and with results from independently performed deconvolution analyses, especially when analysing lower resolution images (compared to non-aggregated results). Our method is implemented using the Squidpy framework for spatial transcriptomics data analyses and available as a Python package.