allelic_imbalance • ASPEN

Identifying genes with allelic imbalance

Introduction

ASPEN is a suite of statistical tests for evaluating allele-specific expression (ASE). It models the allelic ratio — defined as the fraction of reads from the reference allele relative to the total number of reads mapped to a gene — using the beta-binomial distribution. The allelic ratio is restricted to $[0,1]$ interval and its distribution can be described with beta-binomial mean, $\mu$ , and dispersion, $\theta$ .

As with other sequencing-based measurements, single-cell RNA-seq counts exhibit mean–variance bias: lowly expressed genes tend to display higher relative variability than highly expressed genes. ASPEN mitigates this bias by modelling allelic dispersion, $\theta$ , as a function of gene expression. Useing the hierarchical Bayes approach, it shrinks the original dispersion estimates toward the common trend, which represents the level of dispersion expected for genes with similar expression.

ASPEN requires two input matrices: the reference allele counts and the total counts (a sum of counts from both alleles).

The main steps in the workflow are as follows:

Setup

#loading required libraries
suppressPackageStartupMessages({
  library(ASPEN)
  library(gridExtra)
  library(openxlsx)
  library(knitr)
  library(SingleCellExperiment)
  library(scran)
  library(huxtable)
})

Loading allele-specifc count data

In this example, we use mouse brain organoid data from Bl6Cast F1 hybrids (Medina-Cano et al., 2025). We load the reference allele counts (Bl6 counts) and the total counts (sum of counts from both alleles). Cell identities were previously annotated based on marker gene expression, resulting in five major cell types: neurogenic progenitor cells (adial glial cells, RGCs), intermediate progenitors cells (IPCs), deep layer neurons (cortical neurons), gliogenic progenitor cells (gliogenic RGCs) and olygodendrocyte precursor cells (OPCs).

data("Bl6_Cast_a1")
data("Bl6_Cast_tot")
load_file <- system.file("extdata", "Bl6_Cast_cell_annot.xlsx", package = "ASPEN")
cell_annot <- read.xlsx(load_file, rowNames = T)
#adding barcode id
cell_annot$cell_id <- paste(cell_annot$clone, cell_annot$cell_barcode, sep = "_")
print_md(as_huxtable(head(cell_annot)))
#> -----------------------------------------------------------------------
#>  cell_barcode background cell_type    cell_idents  clone  cell_id      
#> ------------- ---------- ------------ ------------ ------ -------------
#>  AAACCCACAGCA cast_b6    Gliogenic    Gliogenic    clone3 clone3_AAACC 
#>  GATG                    progenitor   RGCs                CACAGCAGATG  
#>                          cells                                         
#>                                                                        
#>  AAACCCATCAGA cast_b6    Gliogenic    Gliogenic    clone3 clone3_AAACC 
#>  GCGA                    progenitor   RGCs                CATCAGAGCGA  
#>                          cells                                         
#>                                                                        
#>  AAACGAAGTAGT cast_b6    Neurogenic   RGCs         clone3 clone3_AAACG 
#>  TAGA                    progenitor                       AAGTAGTTAGA  
#>                          cells                                         
#>                                                                        
#>  AAACGAAGTGTT cast_b6    Intermediate IPCs         clone3 clone3_AAACG 
#>  ATCG                    neuronal                         AAGTGTTATCG  
#>                          progenitors                                   
#>                                                                        
#>  AAACGAATCTCG cast_b6    Deep layer   Cortical     clone3 clone3_AAACG 
#>  TCAC                    neurons      neurons             AATCTCGTCAC  
#>                                                                        
#>  AAAGGATGTATG cast_b6    Intermediate IPCs         clone3 clone3_AAAGG 
#>  CTTG                    neuronal                         ATGTATGCTTG  
#>                          progenitors                                   
#> -----------------------------------------------------------------------

The cell abundance for each neuronal subtype

print_md(as_huxtable(table(cell_annot$cell_idents)))
#> -----------------------------
#>                   V1         
#> ----------------- -----------
#>  Cortical neurons 1615       
#>                              
#>    Gliogenic RGCs 1157       
#>                              
#>              IPCs 1274       
#>                              
#>              OPCs 1523       
#>                              
#>              RGCs 935        
#> -----------------------------

Counts normalisation

We first normalize the raw single-cell counts using the [computeSumFactors()] function from scran package (Lun, et al. 2016). We then create a SingleCellExperiment object using the total and reference allele count matrices.

#keeping annotated cells in the count matrices
Cast_B6_a1 <- Cast_B6_a1[,gsub(".*_", "", colnames(Cast_B6_a1)) %in% cell_annot$cell_barcode]
Cast_B6_tot <- Cast_B6_tot[,gsub(".*_", "", colnames(Cast_B6_tot)) %in% cell_annot$cell_barcode]

#creating SingleCellExperiment object
ase_sce <- SingleCellExperiment(assays = list(a1 = as.matrix(Cast_B6_a1),
                                              tot = as.matrix(Cast_B6_tot)))

Lowly expressed genes (expressed in less than 10 cells) are removed.

#removing lowly expressed genes
ase_sce <- ase_sce[rowSums(assays(ase_sce)[['tot']] > 1) >= 10, ]
print_md(as_huxtable(dim(ase_sce)))
#> ----------------------
#>                    V1 
#>                       
#>                  1551 
#>                       
#>                  6312 
#> ----------------------

#adding sample id to the metadata
colData(ase_sce)$replicate <- gsub("_.*", "", rownames(colData(ase_sce)))

#calculate size factors
ase_sce <- computeSumFactors(ase_sce, 
                             clusters=colData(ase_sce)$replicate, assay.type = "tot")

The reference allele and total counts are normalized in parallel using the same size factor estimates.

#normalizing counts
ase_sce  <- logNormCounts(ase_sce, 
                          size.factors = colData(ase_sce)$sizeFactor,
                          log = NULL, transform = "none", assay.type = "tot", name = "tot_norm")

#normalizing reference counts by the same size factors
ase_sce  <- logNormCounts(ase_sce, 
                          size.factors = colData(ase_sce)$sizeFactor,
                          log = NULL, transform = "none", assay.type = "a1", name = "a1_norm")

#checking that normalised counts assays are added to the SingleCellExperiment object
ase_sce@assays
#> An object of class "SimpleAssays"
#> Slot "data":
#> List of length 4
#> names(4): a1 tot tot_norm a1_norm

Splitting count matrices by cell type

Allelic imbalance test is run separately for each cell type. First, we split the metadata object by cell types.

#splitting the metadata by cell type
cell_list <- split(cell_annot, f = cell_annot$cell_idents)

Since beta-binomial parameters are estimated from raw counts, we first split the raw total and reference allele count matrices by cell type.

#splitting SingleCellExperiemnt object by cell types
ase_sce_byct <- list()
for (i in 1:length(cell_list)){
    ase_sce_byct[[i]] <- ase_sce[,colnames(ase_sce) %in% cell_list[[i]]$cell_id]
}

#removing genes with low expression
ase_sce_byct_filt <- lapply(ase_sce_byct, function(q) 
                                          q[rowSums(assays(q)[['tot']] > 1) >= 10, ]) 

#extracting raw total counts
tot_mat <- lapply(ase_sce_byct_filt, function(q) as.matrix(assays(q)[['tot']]))
#extracting raw reference allele counts
a1_mat <- lapply(ase_sce_byct_filt, function(q) as.matrix(assays(q)[['a1']]))
#selecting genes that matched filtering criteria
a1_mat <- mapply(function(p,q) p[rownames(q), ], a1_mat, tot_mat, SIMPLIFY = F)

Before proceeding, we check that the reference and total count matrices have the same dimensions. This is for demonstration purposes only. If this condition is not met, ASPEN will issue a warning.

mapply(function(p, q) dim(p) == dim(q), a1_mat, tot_mat, SIMPLIFY = F)
#> [[1]]
#> [1] TRUE TRUE
#> 
#> [[2]]
#> [1] TRUE TRUE
#> 
#> [[3]]
#> [1] TRUE TRUE
#> 
#> [[4]]
#> [1] TRUE TRUE
#> 
#> [[5]]
#> [1] TRUE TRUE

Next, we check that the gene order is identical between the reference and total count matrices. Again, this is for demonstration purposes only. If this condition is not met, ASPEN will issue a warning.

mapply(function(p, q) table(rownames(p) == rownames(q)), a1_mat, tot_mat, SIMPLIFY = F)
#> [[1]]
#> 
#> TRUE 
#> 1386 
#> 
#> [[2]]
#> 
#> TRUE 
#> 1414 
#> 
#> [[3]]
#> 
#> TRUE 
#> 1503 
#> 
#> [[4]]
#> 
#> TRUE 
#> 1521 
#> 
#> [[5]]
#> 
#> TRUE 
#> 1520

Estimating beta-binomial parameters

We estimate beta-binomial parameters separately for each cell type.

bb_init_params <- mapply(function(p, q) estim_bbparams(p, q, min_cells = 5, cores = 6), a1_mat, tot_mat, SIMPLIFY = F)

The output is a table with the following columns:

print_md(as_huxtable(head(bb_init_params[[1]])))
#> Warning in to_md.huxtable(ht, ...): Couldn't print whole table in max_width = 80 characters.
#> Printing 8/9 columns.
#> ---------------------------------------------------------------------
#>         N     AR tot_gene tot_gene    alpha     beta  bb_mu bb_theta 
#>                     _mean _varianc                                   
#>                                  e                                   
#> --------- ------ -------- -------- -------- -------- ------ -------- 
#>       850  0.984     5.94     29.3      175     2.41  0.986   0.0056 
#>                                                                      
#>  1.06e+03  0.999     8.21     45.4 4.41e+03     3.42  0.999   0.0002 
#>                                                                      
#>        37  0.976     1.09     1.78     14.6    0.343  0.977    0.067 
#>                                                                      
#>        25 0.0113    0.953     1.54    0.807     64.2 0.0124   0.0154 
#>                                                                      
#>  1.43e+03  0.695     21.3      322      184     84.5  0.685   0.0037 
#>                                                                      
#>        93 0.0039      1.4     3.57     25.2 6.94e+03 0.0036   0.0001 
#> ---------------------------------------------------------------------

Defining lowly expressed genes

ASPEN applies shrinkage selectively - genes with very low dispersion are not moderated and their allelic imbalance is evaluated using the unadjusted values. Genes with stable dispersion are determined based on the residuals from the dispersion modeling step. ASPEN calculates the meadian absolute deviation-squared ( ${MAD}^2$ ), which is used as a cut-off.

min_cutoff <- lapply(bb_init_params, calc_mad)
min_cutoff
#> [[1]]
#> [1] 0.002254469
#> 
#> [[2]]
#> [1] 0.001084665
#> 
#> [[3]]
#> [1] 0.002810162
#> 
#> [[4]]
#> [1] 0.00113423
#> 
#> [[5]]
#> [1] 0.001276519

Estimate appropriate shrinkage parameters

We estimate shrinkage parameters, $N$ and $\delta$ , separately for each cell type. Optionally, genes with very low dispersion can be excluded from the estimation by setting a minimum cut-off value with thetaFilter parameter

set.seed(1001011)
shrink_pars <- mapply(function(p, q) estim_delta(p, thetaFilter = q),
                      bb_init_params, min_cutoff, SIMPLIFY = F)
shrink_pars
#> [[1]]
#>     N delta 
#>    17    18 
#> 
#> [[2]]
#>     N delta 
#>    13    15 
#> 
#> [[3]]
#>     N delta 
#>    17    17 
#> 
#> [[4]]
#>     N delta 
#>    14    17 
#> 
#> [[5]]
#>     N delta 
#>    20    19

Performing Bayesian shrinkage

We use [correct_theta()] from ASPEN to obtain posterior dispersion estimates. The minimum dispersion value is set with thetaFilter parameter. and genes with residuals below that value are excluded shrinkage.

bb_init_params <- lapply(bb_init_params, function(q) q[!is.na(q$bb_theta),])
shrunk_estims_vardelta <- mapply(function(p, q) correct_theta(p, N_set = q[1], delta_set = q[2], thetaFilter = 0.001),
                                 bb_init_params, shrink_pars, SIMPLIFY = F)
print_md(as_huxtable(head(shrunk_estims_vardelta[[1]])))
#> Warning in to_md.huxtable(ht, ...): Couldn't print whole table in max_width = 80 characters.
#> Printing 8/13 columns.
#> ---------------------------------------------------------------------
#>         N     AR tot_gene tot_gene    alpha     beta  bb_mu bb_theta 
#>                     _mean _varianc                                   
#>                                  e                                   
#> --------- ------ -------- -------- -------- -------- ------ -------- 
#>       850  0.984     5.94     29.3      175     2.41  0.986   0.0056 
#>                                                                      
#>  1.06e+03  0.999     8.21     45.4 4.41e+03     3.42  0.999   0.0002 
#>                                                                      
#>        37  0.976     1.09     1.78     14.6    0.343  0.977    0.067 
#>                                                                      
#>        25 0.0113    0.953     1.54    0.807     64.2 0.0124   0.0154 
#>                                                                      
#>  1.43e+03  0.695     21.3      322      184     84.5  0.685   0.0037 
#>                                                                      
#>        93 0.0039      1.4     3.57     25.2 6.94e+03 0.0036   0.0001 
#> ---------------------------------------------------------------------

Visualizing model fit.

Genes with extremely low dispersion levels form a distinct cluster in the dispersion–expression plot. We estimate that those genes have $\theta < 0.001$ . To exclude them from the shrinkage procedure, we set thetaFilter = 0.001 in the [correct_theta()] function.

celltypes <- list("Cortical neurons", "Gliogenic RGCs", "IPCs", "OPCs", "RGCs")
p_disp <- mapply(function(p,q) plot_disp_fit_theta(p, midpoint = 100) +
                                          labs(subtitle = paste0(q, " Cast_B6")) + geom_hline(yintercept = log(1e-03), linetype = "dashed", linewidth = 1),
                 shrunk_estims_vardelta, celltypes, SIMPLIFY = F)
do.call(grid.arrange, c(p_disp, ncol = 3))
#> Warning: Removed 48 rows containing non-finite outside the scale range
#> (`stat_pointdensity()`).
#> Warning: Removed 73 rows containing non-finite outside the scale range
#> (`stat_pointdensity()`).
#> Warning: Removed 81 rows containing non-finite outside the scale range
#> (`stat_pointdensity()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 60 rows containing non-finite outside the scale range
#> (`stat_pointdensity()`).
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).
#> Warning: Removed 96 rows containing non-finite outside the scale range
#> (`stat_pointdensity()`).

Visualizing original and shrunk $\theta$ ’s.


p_disp <- mapply(function(p,q) plot_disp(p) +
                               labs(subtitle = paste0(q, " Cast_B6")) + 
                               geom_hline(yintercept = log(1e-03), linetype = "dashed", linewidth = 1),
                 shrunk_estims_vardelta, celltypes, SIMPLIFY = F)
do.call(grid.arrange, c(p_disp, ncol = 3))
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).
#> Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).

Alternatively, we can set $\delta$ and $N$ parameters manually.

shrunk_estims <- lapply(bb_init_params, function(q) correct_theta(q, delta_set = 50, N_set = 30, thetaFilter = 0.001))

p_disp <- mapply(function(p,q) plot_disp(p) +
                               labs(subtitle = paste0(q, " Cast_B6")) + 
                               geom_hline(yintercept = log(1e-03), linetype = "dashed", linewidth = 1),
                 shrunk_estims, celltypes, SIMPLIFY = F)

do.call(grid.arrange, c(p_disp, ncol = 3))
#> Warning: Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).
#> Removed 1 row containing missing values or values outside the scale range
#> (`geom_line()`).

### Estimating global beta-binomial parameters

Evaluating global beta-binomial parameters provides an overall measure of bias toward the reference allele. This is done by estimating beta-binomial distribution parameters across all genes, excluding those located on sex chromosomes and known imprinted genes. For the GRCm38 (mm10) mouse genome, the package includes predefined lists of X- and Y-linked genes as well as a set of validated imprinted genes. These can be excluded by passing them to the genes.excl parameter in the [glob_disp()] function.

load_file <- system.file("extdata", "mm10_genesXY.txt", package = "ASPEN")
genesXY <- read.table(load_file)
load_file <- system.file("extdata", "mm10_imprinted_genes.xlsx", package = "ASPEN")
genesIMPR <- read.xlsx(load_file, colNames = T)
genes2remove <- c(genesXY$V1, genesIMPR$imprinted.genes)

global_estims <- mapply(function(p, q)  glob_disp(p, q, genes.excl = genes2remove, min_counts = 5),
                        a1_mat, tot_mat, SIMPLIFY = F)
global_estims
#> [[1]]
#>     mu.1  theta.1  alpha.1   beta.1 
#> 0.544500 0.066400 8.203912 6.862176 
#> 
#> [[2]]
#>      mu.1   theta.1   alpha.1    beta.1 
#>  0.530700  0.050300 10.553884  9.332994 
#> 
#> [[3]]
#>     mu.1  theta.1  alpha.1   beta.1 
#> 0.541900 0.055100 9.827842 8.307707 
#> 
#> [[4]]
#>      mu.1   theta.1   alpha.1    beta.1 
#>  0.528700  0.049700 10.639340  9.483841 
#> 
#> [[5]]
#>      mu.1   theta.1   alpha.1    beta.1 
#>  0.529800  0.050500 10.496454  9.314595

Visualizing global allelic ratio distribution across all genes.

Deviation from the balanced allelic expression ( $AR = 0.5$ ) indicates a presence of a skew towards the reference allele. If reference allele bias is detected, the null hypothesis for the allelic imbalance testing should be adjusted accordingly. For analysis of the Bl6Cast hybrids, instead of using $H_0: \mu = 0.5$ , the null hypotheses will be adjusted to $H_0: \mu = 0.54$ for Cortical neurons and IPCs datasets and to $H_0: \mu = 0.53$ for Gliogenic RGCs, OPCs and RGCs.

p_glob <- mapply(function(p,q,r,s) plot_glob_params(p, q, r, min_counts = 5) +
                               labs(subtitle = paste0(s, " Cast_B6")),
                 a1_mat, tot_mat, global_estims, celltypes, SIMPLIFY = F)
#> Warning: The melt generic in data.table has been passed a data.frame and will
#> attempt to redirect to the relevant reshape2 method; please note that reshape2
#> is superseded and is no longer actively developed, and this redirection is now
#> deprecated. To continue using melt methods from reshape2 while both libraries
#> are attached, e.g. melt.list, you can prepend the namespace, i.e.
#> reshape2::melt(plot_data). In the next version, this warning will become an
#> error.

#> Warning: The melt generic in data.table has been passed a data.frame and will
#> attempt to redirect to the relevant reshape2 method; please note that reshape2
#> is superseded and is no longer actively developed, and this redirection is now
#> deprecated. To continue using melt methods from reshape2 while both libraries
#> are attached, e.g. melt.list, you can prepend the namespace, i.e.
#> reshape2::melt(plot_data). In the next version, this warning will become an
#> error.

#> Warning: The melt generic in data.table has been passed a data.frame and will
#> attempt to redirect to the relevant reshape2 method; please note that reshape2
#> is superseded and is no longer actively developed, and this redirection is now
#> deprecated. To continue using melt methods from reshape2 while both libraries
#> are attached, e.g. melt.list, you can prepend the namespace, i.e.
#> reshape2::melt(plot_data). In the next version, this warning will become an
#> error.

#> Warning: The melt generic in data.table has been passed a data.frame and will
#> attempt to redirect to the relevant reshape2 method; please note that reshape2
#> is superseded and is no longer actively developed, and this redirection is now
#> deprecated. To continue using melt methods from reshape2 while both libraries
#> are attached, e.g. melt.list, you can prepend the namespace, i.e.
#> reshape2::melt(plot_data). In the next version, this warning will become an
#> error.

#> Warning: The melt generic in data.table has been passed a data.frame and will
#> attempt to redirect to the relevant reshape2 method; please note that reshape2
#> is superseded and is no longer actively developed, and this redirection is now
#> deprecated. To continue using melt methods from reshape2 while both libraries
#> are attached, e.g. melt.list, you can prepend the namespace, i.e.
#> reshape2::melt(plot_data). In the next version, this warning will become an
#> error.
do.call(grid.arrange, c(p_glob, ncol = 3))

Allelic imbalance test

The allelic imbalance test is performed on the normalized counts.

#extracting normalised reference allele and total counts
tot_norm <- lapply(ase_sce_byct_filt, function(q) as.matrix(assays(q)[['tot_norm']]))

#extracting raw reference allele counts
a1_norm <- lapply(ase_sce_byct_filt, function(q) as.matrix(assays(q)[['a1_norm']]))
#ensure that the genes order is the same
a1_norm <- mapply(function(p,q) p[rownames(q), ], a1_norm, tot_norm, SIMPLIFY = F)

We run [bb_mean()] to identify genes with allelic ratio deviating from the null hypothesis. By default, ASPEN requires each gene to be expressed in at least 5 cells with a minimum of 5 counts per cell. These thresholds can be adjusted with min_cells and min_counts parameters.

bb_mean_res <- mapply(function(p, q, r, s) bb_mean(p, q, r, 
                                                   min_cells = 5, min_counts = 5, 
                                                   glob_params = s), 
                      a1_norm, tot_norm, shrunk_estims, global_estims, SIMPLIFY = F)

The output of [beta_binom_test()] function is a table that combines the output of the [estim_bbparams()] and [correct_theta()] functions and adds the following columns:\

If a gene does not meet the quality filter (here, a minimum of 5 cells with at least 5 mapped reads), the allelic imbalance test is not performed for that gene. These genes will have NA in the above fields. We remove such genes before calculating the false discovery rate (FDR).

bb_mean_res <- lapply(bb_mean_res, function(q) q <- q[!is.na(q$pval_mean),])

#calculating fdr
bb_mean_res <- lapply(bb_mean_res, function(q) {q$fdr_mean <- p.adjust(q$pval_mean, 
                                                                       method = "fdr");                                                          q <- q[order(q$fdr_mean),];
                                                return(q)})

Top 10 genes with significant allelic imbalance after FDR-adjustment

lapply(bb_mean_res, function(q) q[1:10, c("AR", "fdr_mean")])
#> [[1]]
#>                 AR fdr_mean
#> Stmn1  0.984126550        0
#> Uchl1  0.998927886        0
#> Hmgb1  0.976436130        0
#> Gm9794 0.011265810        0
#> Ftl1   0.694522296        0
#> Mif    0.003895411        0
#> Rps15  0.838435691        0
#> Rps24  0.203198897        0
#> Sec61g 0.024611841        0
#> Eif4a1 0.998207009        0
#> 
#> [[2]]
#>                  AR fdr_mean
#> Stmn1   0.995669976        0
#> Uchl1   0.999464668        0
#> Gm9794  0.008240608        0
#> Ftl1    0.680002167        0
#> Rps15   0.864274116        0
#> Rps24   0.202660961        0
#> Sec61g  0.032801669        0
#> Hnrnpa1 0.018688305        0
#> Rpl35a  0.999114066        0
#> Ubb     0.998768725        0
#> 
#> [[3]]
#>                  AR fdr_mean
#> Stmn1   0.988800741        0
#> Uchl1   0.999650513        0
#> Gm9794  0.011140701        0
#> Ftl1    0.672111891        0
#> Mif     0.002981515        0
#> Rps15   0.865347111        0
#> Rps24   0.206824332        0
#> Sec61g  0.024171380        0
#> Eif4a1  0.998726346        0
#> Hnrnpa1 0.026185317        0
#> 
#> [[4]]
#>                 AR fdr_mean
#> Stmn1  0.990463015        0
#> Uchl1  0.999502275        0
#> Hmgb1  0.947855448        0
#> Gm9794 0.013659644        0
#> Ftl1   0.666048789        0
#> Mif    0.002498002        0
#> Rps15  0.882687034        0
#> Rps24  0.207635447        0
#> Sec61g 0.029949202        0
#> Eif4a1 0.999096341        0
#> 
#> [[5]]
#>                 AR fdr_mean
#> Stmn1  0.991578964        0
#> Uchl1  0.998702983        0
#> Gm9794 0.004754441        0
#> Ftl1   0.675546570        0
#> Mif    0.004378284        0
#> Rps15  0.867803259        0
#> Rps24  0.195924785        0
#> Sec61g 0.028287041        0
#> Eif4a1 0.998525074        0
#> Rpl35a 0.997623061        0

We can visualize the allelic distribution for some of the top-ranked genes. In the plots below, each point represents a single cell, and points are colored by the log(mean expression).

#specifiying genes for plotting
genes_select <- list("Stmn1", "Ftl1", "Mif")

#generating data frame for plotting
plot_data <- lapply(genes_select, function(q) makedf(a1_mat[[1]], tot_mat[[1]], gene = q))

p_ar_dist <- mapply(function(p,q,r) plot_distr(p, gene = q),
                 plot_data, genes_select, SIMPLIFY = F)
do.call(grid.arrange, c(p_ar_dist, ncol = 3))