ClusterMap - Compare multiple Single cell RNA-seq profiling

Table of Contents

1 Introduction

ClusterMap is designed to analyze and compare two or more single cell expression datasets. ClusterMap suppose that the analysis for each single dataset and combined dataset are done. If not, the package also provides quick analysis function "make_single_obj" and "make_comb_obj" to generate Seurat object.

The pre-analysis for the following datasets is here: pre_analysis/pre_analysis.html

2 Guide

2.1 Quick start

The epithelial cell datasets were generated in the study of Pal B, et al.(https://www.nature.com/articles/s41467-017-01560-x). Cells were collected from mammary glands of adult mice at different phases of the estrus cycle. By pooling the glands from two mice, scRNA-seq of 2729 total epithelial cells in estrus and 2439 cells in diestrus were performed using the 10X Chromium platform.

Pre-analysis by Seruat

Analysis were performed with Seruat package. 13 sub-groups of diestrus stage cells and 10 sub-groups of estrus stage cells were defined, as well as the maker genes of each group.

p7d.tsne.png p7e.tsne.png

2.1.1 Simple run

First, use a list of marker genes of each sub-group in each sample to map sub-groups across samples. The marker gene file can be the direct output of "FindAllMarkers" of Seruat package, or can be a table with columns of "cluster" and "gene".
The "edge_cutoff" can be adjust to decide if a singleton should be merged with a paired groups or not. See "2.2 More options"

library(ClusterMap)

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'p7')
res

By calling "cluster_map", the mapped sub-groups information can be generate as the following figures and table. They are auto-saved.

Cluster match

Hierachical clustering:

The sub-groups were matched by hierarchical clustering. The distance between the sub-groups across the samples were determined by the existence of the marker genes in each group. The union of all marker genes were considered together. The sub-groups of each sample were clustered by the averaged binary distance of the existence of the marker genes.
The similarity of the matched groups is defined as 1 minus the height of the merging node of the matched groups in the dendragram.

Tree cut:

We search through all the nodes in the dendragram from the bottom. If all the offspring nodes of a node are from the same sample, the node is considered as a pure node. For a given node N with two direct sub-nodes n1 and n2, N will be kept only under three conditions:

  • n1 and n2 are both singleton or pure.
  • n1 is singleton or pure, AND n2 is not removed, AND the edge between N and n2 is smaller than the cutoff.
  • none of n1 or n2 is cut, AND none of the edge between N and n1 or n2 is bigger than the cutoff, AND S1 and S2 are not included by each other. S1 is the unique sample list that all the offspring nodes of n1 comes from, similar for S2.

p7.hcluster.png p7.heatmap.hcluster.png

Cluster match results were saved as file p7.cluster.map.csv . The contents are

n p7d p7e similarity regroup
1 p7d_4 p7e_1 0.65 1
2 p7d_2 p7e_3;p7e_4 0.5 2
3 p7d_5 p7e_0;p7e_2 0.47 3
4 p7d_6 p7e_5 0.44 4
5 p7d_12 p7e_9 0.42 5
6 p7d_10 p7e_8 0.4 6
7 p7d_0;p7d_8 NA 0.34 7
8 p7d_7 p7e_6 0.33 8
9 p7d_11 p7e_7 0.29 9
10 p7d_1 NA NA 10
11 p7d_3 NA NA 11
12 p7d_9 NA NA 12

2.1.2 Full run

Further more, besides the marker gene list, if the Seurat object of sinlge dataset and combined dataset are provided, the full analysis can be done.

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS', p7 = 'pre_analysis/p7.RDS')
objList <- lapply(fList, readRDS)

single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'p7', single_obj_list = single_obj_list, comb_obj = objList$p7)
res

By calling "cluster_map" this way, Circos plot, re-colored t-SNE plot will be generated as following. As well as the cell percentage change and sample separability of each new group.

Circos plot and population size

To visualized the results, we use Circos plot to summarize.

  • The cords of the Circos plot indicate the linkage between the sub-groups.
  • The percentage of cell numbers in each sample is represented by the width of the black sectors. The sub-population size change is intuitive by comparing the sector size of matched (cord linked) groups.
  • The color of the cords indicates different new groups, which matches the recolored t-SNE plot.
  • The transparency of the cord color indicates the similarity of the matched groups.

p7.circos.png

Recolor

For easy comparison, each sample and combined sample are recolored by their new groups after matching under t-SNE coordinates in the pre-analysis. Cells belonging to the same new group in different samples are recolored in the same color. Thus, some samples may miss one or more colors, which means no such group in the sample. New groups will be saved as a list into RDS file, which can be used later.

p7d.recolor.tsne.png p7e.recolor.tsne.png

p7.recolor.tsne.png p7.color.by.sample.tsne.png

Separability

After matching and regrouping, we estimate the condition effect on each paired groups besides the percentage change. To quantify the difference between paired groups, separability was calculated for each new group. Separability will be generated pairwise if more than two samples.

For each new group, the separability is defined as the median difference of inna- and inter- sample distance of each cell in the combined t-SNE coordinates. The inna-sample distance is defined as the median distance of K-nearest cells of the same sample. Defualt K=5. The inter-sample distance is the same but cells of different sample.

Using median instead of mean will reduce the variation due to the outliners. The higher the more separable.

Map results

From the following results, we see that new group 2 and 4 are one to multiple match. There is no matched groups for the new group 8, 10, 11, 12 at current cutoff. Both circos plot and recolored t-SNE plot showed the cell percentage change of the group 2, 4 and 9. The separability of each new group further emerged the most affected sub-types, such as group 5, 7 and 4, besides the no matched groups.

The full results were saved as in file p7.results.csv

n p7d p7e similarity regroup p7d_cell_perc p7e_cell_perc p7d.vs.p7e_separability
1 p7d_4 p7e_1 0.65 1 0.11 0.18 0.85
2 p7d_2 p7e_3;p7e_4 0.5 2 0.14 0.14;0.1 0.34
3 p7d_5 p7e_0;p7e_2 0.47 3 0.08 0.2;0.15 2.93
4 p7d_6 p7e_5 0.44 4 0.06 0.07 0.99
5 p7d_12 p7e_9 0.42 5 0.01 0.02 0.03
6 p7d_10 p7e_8 0.4 6 0.02 0.04 10.56
7 p7d_0;p7d_8 NA 0.34 7 0.17;0.06 NA Inf
8 p7d_7 p7e_6 0.33 8 0.06 0.05 7.13
9 p7d_11 p7e_7 0.29 9 0.01 0.05 0.53
10 p7d_1 NA NA 10 0.15 NA Inf
11 p7d_3 NA NA 11 0.11 NA Inf
12 p7d_9 NA NA 12 0.02 NA Inf

2.1.3 Known markers

Core basal genes: Krt14, Acta2, Myl9, Sparc, Mylk, Cxcl14 – group 3, 4, 7, 8
Luminal genes: Elf5, Prlr, Areg, Ly6d, Stc2, Krt19 – group 1, 11, 12, 9
ML cell-specific genes: Prlr, Cited1, Esrrb, and Cxcl15 – group 2, 5, 6, 10

library(Seurat)

FeaturePlot(object = objList$p7d, features.plot = c("Krt14", "Acta2", "Elf5", "Prlr", "Krt19","Pgr"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.p7d.png')

FeaturePlot(object = objList$p7e, features.plot = c("Krt14", "Acta2", "Elf5", "Prlr", "Krt19","Pgr"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.p7e.png')

P7D

feature.gene.p7d.png

P7E

feature.gene.p7e.png

2.1.4 Marker genes of new groups

library(Seurat)
new.group.list <- readRDS('p7.new.group.list.rds')
names(new.group.list)
## [1] "p7d"  "p7e"  "comb"

p7 <- objList$p7
p7@ident <- new.group.list$comb
TSNEPlot(object = p7, do.label = T, label.size = 6)

p7.markers <- FindAllMarkers(object = p7, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
write.csv(p7.markers, file = 'p7.new.group.markers.csv')

## p7.markers.10.vs.12 <- FindMarkers(object = p7, ident.1 = '10',ident.2 = '2', only.pos = F, min.pct = 0.25, min.diff.pct = 0.1, thresh.use = 0.25)
## write.csv(p7.markers.10.vs.12,file = 'p7.markers.10.vs.12.csv')

2.2 More options

To custermized the analysis, step by step analysis could be performed.

2.2.1 Try different singleton cutoff for cluster match.

The cutoff is the edge length cutoff for deciding if a singleton should be merged or not. We suggest the cutoff is better to be less than 0.2.

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')
## cutoff 0.1
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.1, output = 'p7')
## cutoff 0.15
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.15, output = 'p7_cutoff0.15')
mapRes ## results didn't change.
## cutoff 0.2
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.2, output = 'p7_cutoff0.2')
mapRes ## p7d_9 was grouped together with p7d_10 and p7e_8.

cutoff = 0.2

n p7d p7e similarity regroup
1 p7d_4 p7e_1 0.65 1
2 p7d_2 p7e_3;p7e_4 0.50 2
3 p7d_5 p7e_0;p7e_2 0.47 3
4 p7d_6 p7e_5 0.44 4
5 p7d_12 p7e_9 0.42 5
6 p7d_0;p7d_8 NA 0.34 6
7 p7d_7 p7e_6 0.33 7
8 p7d_11 p7e_7 0.29 8
9 p7d_10;p7d_9 p7e_8 0.24 9
10 p7d_1 NA NA 10
11 p7d_3 NA NA 11

2.2.2 Use cell number information to generate circos plot.

The cell numbers for each sub-group can be pulled out from Seruat object or provided by the user. cell_num_list is a list of vectors for each sample. Each vector contains the cell numbers for each sub-group of the sample.

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS')
objList <- lapply(fList, readRDS)
single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)

cell_num_list <- lapply(single_obj_list, function(obj) summary(obj@ident))

mapRes <- read.csv('p7.results.csv') ## This file was auto-saved when cluster_map_by_marker function was run.

circos_map(mapRes, cell_num_list, output = 'p7')
res <- add_perc(mapRes, cell_num_list)
res

Number of samples in circos plot

If there are more than 3 samples, the circos plot will become complicated. We can choose a subset of samples to plot. To achieve this, just remove extra samples from the mapRes and cell_num_list variables. Then use circos_map to plot a subset.

2.2.3 Change colors in circos plot

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS')
objList <- lapply(fList, readRDS)
single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)
cell_num_list <- lapply(single_obj_list, function(obj) summary(obj@ident))
mapRes <- read.csv('p7.results.csv')## This file was auto-saved when cluster_map_by_marker function was run.

library(RColorBrewer)
circos_map(mapRes, cell_num_list, output = 'p7.tmp', color_cord = brewer.pal(n = 12, 'Paired'))

p7.tmp.circos.png

2.2.4 Recolor one single sample in t-SNE plot

p7d <- readRDS('pre_analysis/p7d.RDS')
mapRes <- read.csv('p7.results.csv') ## This file was auto-saved when cluster_map_by_marker function was run.
mapRes

da <- structure(as.vector(mapRes[, 'p7d']), names = mapRes$regroup)
head(da)
p7d.new.group <- recolor_s(da, obj = p7d, output = 'p7d') ## new group asignment of each cell is outputed as well.              
## custermized color
library(RColorBrewer)
p7d.new.group <- recolor_s(da, obj = p7d, output = 'p7d.tmp', color = brewer.pal(n = 12, 'Paired'))


p7e <- readRDS('pre_analysis/p7e.RDS')
da <- structure(as.vector(mapRes[, 'p7e']), names = mapRes$regroup)
head(da)
p7e.new.group <- recolor_s(da, obj = p7e, output = 'p7e') ## new group asignment of each cell is outputed as well.              
## custermized color
p7e.new.group <- recolor_s(da, obj = p7e, output = 'p7e.tmp', color = brewer.pal(n = 12, 'Paired'))

p7d.tmp.recolor.tsne.png p7e.tmp.recolor.tsne.png

2.2.5 Recolor combined sample in t-SNE plot

p7 <- readRDS('pre_analysis/p7.RDS')

new_group_comb <- recolor_comb(comb_obj = p7, new_group_list = list(p7d = p7d.new.group, p7e = p7e.new.group), output = 'p7', comb_delim = '-') ## new group list generated by recolor_s as in the previous code block.     

## custermized color
library(RColorBrewer)
new_group_comb <- recolor_comb(comb_obj = p7, new_group_list = list(p7d = p7d.new.group, p7e = p7e.new.group), output = 'p7.tmp', comb_delim = '-', color = brewer.pal(n = 12, 'Paired'))

p7.tmp.recolor.tsne.png

2.2.6 Adjust K for seperability

p7 <- readRDS('pre_analysis/p7.RDS')
tsne_coord <- as.data.frame(p7@dr$tsne@cell.embeddings)

new_group_list <- readRDS('p7.new.group.list.RDS')
group <- new_group_list$comb
head(group)

sample_label <- as.factor(sub('-.*', '', names(group)))
## k=5
sepa_k5 <- separability_pairwise(tsne_coord, group, sample_label, k = 5)
sepa_k5
## k=10
sepa_k10 <- separability_pairwise(tsne_coord, group, sample_label, k = 10)
sepa_k10

k = 5

regroup p7d.vs.p7e
1 0.85
2 0.34
3 2.93
4 0.99
5 0.03
6 10.56
7 Inf
8 7.13
9 0.53
10 Inf
11 Inf
12 Inf

k = 10

regroup p7d.vs.p7e
1 0.98
2 0.42
3 3.75
4 1.00
5 -0.38
6 11.43
7 Inf
8 7.71
9 0.99
10 Inf
11 Inf
12 Inf

3 More case study

3.1 Immune stimulated datasets

The dataset comes from this study: https://www.nature.com/articles/nbt.4042

A pool of peripheral blood mononuclear cells (PBMCs) from eight lupus patients was studied. PBMCs from each patient were untreated as control or activated with recombinant interferon-beta IFN-beta for 6 hours. 14,619 cells for control and 14,446 cells for stimulated were obtained from 10X Chromium instrument sequencing.

Pre-analysis by Seruat

ctrl.tsne.png stim.tsne.png

ClusterMap analysis

library(ClusterMap)

marker_file_list <- c(ctrl = 'pre_analysis/ctrl.markers.csv', stim = 'pre_analysis/stim.markers.csv')

fList <- c(ctrl = "pre_analysis/ctrl.RDS",
        stim = "pre_analysis/stim.RDS",
        comb = "pre_analysis/immune.RDS")
immuneList <- lapply(fList, readRDS)

single_obj_list <- c(ctrl = immuneList$ctrl, stim = immuneList$stim)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'immune', single_obj_list = single_obj_list, comb_obj = immuneList$comb)
res

Cluster match

immune.hcluster.png immune.heatmap.hcluster.png

Circos plot

immune.circos.png

Recolor

ctrl.recolor.tsne.png stim.recolor.tsne.png

immune.recolor.tsne.png immune.color.by.sample.tsne.png

Map results

immune.results.csv

n ctrl stim similarity regroup ctrl_cell_perc stim_cell_perc ctrl.vs.stim_separability
1 ctrl_6 stim_6 0.6 1 0.04 0.04 0.24
2 ctrl_2 stim_2 0.53 2 0.12 0.11 0.55
3 ctrl_1 stim_1 0.5 3 0.18 0.21 0.78
4 ctrl_5 stim_4 0.49 4 0.08 0.07 5.01
5 ctrl_4 stim_3 0.48 5 0.09 0.11 0.55
6 ctrl_0;ctrl_8 stim_0;stim_11 0.45 6 0.31;0.02 0.28;0.01 11.03
7 ctrl_7 stim_8 0.39 7 0.03 0.03 7.05
8 ctrl_10 stim_10 0.34 8 0.01 0.01 0.55
9 ctrl_3 stim_5;stim_7 0.3 9 0.09 0.07;0.04 0.37
10 ctrl_9 NA NA 10 0.02 NA Inf
11 NA stim_12 NA 11 NA 0 Inf
12 NA stim_9 NA 12 NA 0.02 Inf

Cell type and Known markers

cM, CD14+CD16- monocytes: CD14, LYZ — group 6
ncM, CD14+CD16+ monocytes: FCGR3A, MS4A7 — group 4
DC, dendritic cells: FCER1A, CST3 — group 7
Mkc, megakaryocytes: PPBP — group 10, 12
Th, CD4+ T cells: IL7R — group 2, 3
B, B cells: MS4A1 — group 5
Tc, CD8+ T cells: CD8A — group 9
NK, natural killer cells: GNLY, NKG7 — group 9

library(Seurat)

FeaturePlot(object = immuneList$ctrl, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.ctrl.png')

FeaturePlot(object = immuneList$stim, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.stim.png')

## FeaturePlot(object = all, features.plot = c("CD4","TBX21",'GATA3','SELL','CREM'), cols.use = c("grey", "blue"), reduction.use = "tsne")
## savePlot('features.immune.png')

Control

feature.gene.ctrl.png

Stimulated

feature.gene.stim.png

3.2 PBMCs datasets

The dataset is downloaded from 10X Genomics single cell gene expression datasets:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k
https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k

They are peripheral blood mononuclear cells (PBMCs) from the same healthy donor.

Pre-analysis

pbmc4k.tsne.png pbmc8k.tsne.png

ClusterMap analysis

library(ClusterMap)

marker_file_list <- c(pbmc4k = 'pre_analysis/pbmc4k.markers.csv', pbmc8k = 'pre_analysis/pbmc8k.markers.csv')

fList <- c(pbmc4k = 'pre_analysis/pbmc4k.RDS',
        pbmc8k = 'pre_analysis/pbmc8k.RDS',
        pbmc = "pre_analysis/pbmc.RDS")
pbmcList <- lapply(fList, readRDS)

single_obj_list <- c(pbmc4k = pbmcList$pbmc4k, pbmc8k = pbmcList$pbmc8k)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'pbmc', single_obj_list = single_obj_list, comb_obj = pbmcList$pbmc)
res

## marker genes
library(Seurat)

FeaturePlot(object = pbmcList$pbmc4k, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.pbmc4k.png')

FeaturePlot(object = pbmcList$pbmc8k, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.pbmc8k.png')

## FeaturePlot(object = pbmcList$pbmc, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
## savePlot('feature.gene.pbmc.png')

Cluster match

pbmc.hcluster.png pbmc.heatmap.hcluster.png

Circos plot

The cord color in the circos plot is much darker, indicating the high similarity of the matched groups.

pbmc.circos.png

Recolor

pbmc4k.recolor.tsne.png pbmc8k.recolor.tsne.png

pbmc.recolor.tsne.png pbmc.color.by.sample.tsne.png

Map results

Clear one-to-one match. Majority of the cell percentage of sub-types are not changed. The separability shows that there is no obvious difference between the two samples in any of the matched pairs.

pbmc.results.csv

n pbmc4k pbmc8k similarity regroup pbmc4k_cell_perc pbmc8k_cell_perc pbmc4k.vs.pbmc8k_separability
1 pbmc4k_4 pbmc8k_6 0.83 1 0.1 0.09 -0.09
2 pbmc4k_6 pbmc8k_4 0.82 2 0.08 0.11 0.03
3 pbmc4k_2 pbmc8k_5 0.82 3 0.13 0.1 -0.01
4 pbmc4k_0 pbmc8k_0 0.79 4 0.19 0.16 -0.02
5 pbmc4k_3 pbmc8k_2 0.76 5 0.12 0.12 -0.08
6 pbmc4k_5 pbmc8k_3 0.76 6 0.09 0.12 -0.07
7 pbmc4k_10 pbmc8k_9 0.74 7 0.03 0.03 -0.08
8 pbmc4k_9 pbmc8k_10 0.71 8 0.03 0.03 -0.01
9 pbmc4k_8 pbmc8k_7 0.71 9 0.04 0.06 0.05
10 pbmc4k_1 pbmc8k_1 0.68 10 0.13 0.12 -0.08
11 pbmc4k_7 pbmc8k_8 0.66 11 0.04 0.06 0.05
12 pbmc4k_11 pbmc8k_12 0.61 12 0.01 0.01 -0.08
13 pbmc4k_12 pbmc8k_11 0.48 13 0.01 0.01 -0.02

Known markers

PBMC4K

feature.gene.pbmc4k.png

PBMC8K

feature.gene.pbmc8k.png

4 Simulated data

We generate a random tree with 4 samples and 10 sub-groups in each sample to test our tree cut algorithm.

library(ape)
set.seed(1) 
tree <- rcoal(40, tip.label = paste0(c('A', 'B', 'C', 'D'), '_', rep(1:10, each = 4)), br = runif(80, min = 0, max = 0.02))
## plot(tree, edge.width = 2, label.offset = 0.01)
## nodelabels()
## tiplabels()
## axisPhylo(side = 1)
saveRDS(tree, file = 'simulated.tree.RDS')

hc = as.hclust.phylo(tree)

png('simulated.tree.png')
plot(hc)
dev.off()

res <- purity_cut(hc, cutoff = 0.1)
write.csv(res, file = 'simulated.c0.1.results.csv')

res <- purity_cut(hc, cutoff = 0.2)
write.csv(res, file = 'simulated.c0.2.results.csv')

simulated.tree.png

cutoff = 0.1

n A B C D similarity regroup
1 NA B_3 C_7 NA 0.99 1
2 A_9;A_5 NA NA NA 0.96 2
3 A_6 NA NA D_4 0.93 3
4 A_1 B_8 NA D_10 0.88 4
5 NA NA C_6;C_5 NA 0.84 5
6 NA B_2 NA D_1 0.81 6
7 A_4 B_1 C_9;C_8 NA 0.8 7
8 A_10 B_5 NA NA 0.77 8
9 NA B_4 C_4 NA 0.72 9
10 NA B_9;B_10 NA NA 0.68 10
11 NA NA C_10 D_2 0.55 11
12 A_8 B_6 C_3;C_1 NA 0.43 12
13 NA NA NA D_9 NA 13
14 NA NA NA D_6 NA 14
15 NA NA NA D_7 NA 15
16 A_7 NA NA NA NA 16
17 NA NA NA D_5 NA 17
18 NA NA NA D_3 NA 18
19 NA NA NA D_8 NA 19
20 A_2 NA NA NA NA 20
21 A_3 NA NA NA NA 21
22 NA NA C_2 NA NA 22
23 NA B_7 NA NA NA 23

cutoff = 0.2

n A B C D similarity regroup
1 NA B_3 C_7 NA 0.99 1
2 A_9;A_5 NA NA NA 0.96 2
3 A_6 NA NA D_4 0.93 3
4 NA NA C_6;C_5 NA 0.84 4
5 A_4 B_1 C_9;C_8 NA 0.8 5
6 NA B_4 C_4 NA 0.72 6
7 A_1 B_9;B_10;B_8 NA D_5;D_10;D_3 0.61 7
8 A_10 B_5 NA D_9 0.6 8
9 NA B_2 C_10 D_6;D_1;D_7;D_2 0.51 9
10 A_8 B_6 C_3;C_1 NA 0.43 10
11 A_7 NA NA NA NA 11
12 NA NA NA D_8 NA 12
13 A_2 NA NA NA NA 13
14 A_3 NA NA NA NA 14
15 NA NA C_2 NA NA 15
16 NA B_7 NA NA NA 16

R version 3.4.3 (2017-11-30)

Seurat_2.2.1

Author: Xin Gao

Created: 2018-05-22 Tue 18:03

Emacs 24.5.4 (Org mode 8.2.10)

Validate