DeCovarT, a R package for a robust deconvolution of cell mixture in transcriptomic samples using a multivariate Gaussian generative framework
Bastien Chassagnol  1, 2, 3@  , Etienne Becht  4, *@  , Gregory Nuel  5, *@  , Yufei Luo  1@  
1 : Institut de Recherches SERVIER
Institut de Recherche Servier
2 : Laboratoire de Probabilités, Statistique et Modélisation
Sorbonne Université, Centre National de la Recherche Scientifique, Université Paris Cité, Sorbonne Université : UMR_8001, Centre National de la Recherche Scientifique : UMR_8001, Université Paris Cité : UMR_8001
3 : LIP6
Sorbonne Université, Centre National de la Recherche Scientifique, Centre National de la Recherche Scientifique : UMR7606
4 : Institut de Recherches Internationales Servier [Suresnes]
Institut de Recherche Servier
5 : Laboratoire de Probabilités, Statistique et Modélisation
Sorbonne Université, Centre National de la Recherche Scientifique, Université Paris Cité, Sorbonne Université : UMR_8001, Centre National de la Recherche Scientifique : UMR_8001, Université Paris Cité : UMR_8001
* : Auteur correspondant

Transcriptomic analyses have contributed greatly to a better understanding of the biological
processes involved in the evolution of complex and versatile diseases. However, bulk
transcriptomic analyses ignore the heterogenous contribution of diverse cell populations to
samples heterogeneity. Thus, computational deconvolution methods have been developed to
analyse the cellular composition of tissues. However, the performance of these algorithms is
limited in distinguishing between cell populations with very similar expression profiles, and we
hypothesised that integrating the covariance between genes could enhance the performance of
deconvolution algorithms for closely related cell populations. We therefore developed a new
deconvolution algorithm, DeCovarT, which takes into account the transcriptomic network
structure of each cell population. To do so, we represented the set of transcriptomic
interactions as a multivariate Gaussian distribution, assuming a sparse network structure
deduced from the precision matrix returned by the gLasso algorithm. Next, we reconstruct
the overall mixing profile by a generative model, in which we show, under reasonable
assumptions, that the law describing the overall expression profile conditional on the cell ratios
and purified expression profiles also follows a multivariate Gaussian distribution. The
maximum likelihood estimate (MLE) of the associated function, i.e. the cell ratios optimising
the probability of observing the observed transcriptomic distribution, is estimated in our
paper by first reparametrising the log-likelihood function into an unconstrained version and
then optimising it by consecutive iterations of the Levenberg-Marquardt algorithm. This
allows us to obtain an estimator that respects the simplex constraint and to derive the
corresponding asymptotic confidence bands. In addition to the introduction of a new
statistical modelling paradigm, we plan in our presentation to briefly review the standard
optimisation methods implemented in R with their specific features and main restrictions.
Notably, we benchmarked them on a toy example that highlights strong behavioural
differences in the context of constrained optimisation.


Personnes connectées : 1 Vie privée
Chargement...