ClusterMap - Compare multiple Single cell RNA-seq profiling

1. Introduction
2. Guide
- 2.1. Quick start
- 2.2. More options
3. More case study
- 3.1. Immune stimulated datasets
- 3.2. PBMCs datasets
4. Simulated data

1 Introduction

ClusterMap is designed to analyze and compare two or more single cell expression datasets. ClusterMap suppose that the analysis for each single dataset and combined dataset are done. If not, the package also provides quick analysis function "make_single_obj" and "make_comb_obj" to generate Seurat object.

The pre-analysis for the following datasets is here: pre_analysis/pre_analysis.html

2 Guide

2.1 Quick start

The epithelial cell datasets were generated in the study of Pal B, et al.(https://www.nature.com/articles/s41467-017-01560-x). Cells were collected from mammary glands of adult mice at different phases of the estrus cycle. By pooling the glands from two mice, scRNA-seq of 2729 total epithelial cells in estrus and 2439 cells in diestrus were performed using the 10X Chromium platform.

Pre-analysis by Seruat

Analysis were performed with Seruat package. 13 sub-groups of diestrus stage cells and 10 sub-groups of estrus stage cells were defined, as well as the maker genes of each group.

p7d.tsne.png p7e.tsne.png

2.1.1 Simple run

First, use a list of marker genes of each sub-group in each sample to map sub-groups across samples. The marker gene file can be the direct output of "FindAllMarkers" of Seruat package, or can be a table with columns of "cluster" and "gene".
The "edge_cutoff" can be adjust to decide if a singleton should be merged with a paired groups or not. See "2.2 More options"

code

library(ClusterMap)

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'p7')
res

By calling "cluster_map", the mapped sub-groups information can be generate as the following figures and table. They are auto-saved.

Cluster match

Hierachical clustering:

The sub-groups were matched by hierarchical clustering. The distance between the sub-groups across the samples were determined by the existence of the marker genes in each group. The union of all marker genes were considered together. The sub-groups of each sample were clustered by the averaged binary distance of the existence of the marker genes.
The similarity of the matched groups is defined as 1 minus the height of the merging node of the matched groups in the dendragram.

Tree cut:

We search through all the nodes in the dendragram from the bottom. If all the offspring nodes of a node are from the same sample, the node is considered as a pure node. For a given node N with two direct sub-nodes n1 and n2, N will be kept only under three conditions:

n1 and n2 are both singleton or pure.
n1 is singleton or pure, AND n2 is not removed, AND the edge between N and n2 is smaller than the cutoff.
none of n1 or n2 is cut, AND none of the edge between N and n1 or n2 is bigger than the cutoff, AND S1 and S2 are not included by each other. S1 is the unique sample list that all the offspring nodes of n1 comes from, similar for S2.

p7.hcluster.png p7.heatmap.hcluster.png

Cluster match results were saved as file p7.cluster.map.csv . The contents are

n	p7d	p7e	similarity	regroup
1	p7d_4	p7e_1	0.65	1
2	p7d_2	p7e_3;p7e_4	0.5	2
3	p7d_5	p7e_0;p7e_2	0.47	3
4	p7d_6	p7e_5	0.44	4
5	p7d_12	p7e_9	0.42	5
6	p7d_10	p7e_8	0.4	6
7	p7d_0;p7d_8	NA	0.34	7
8	p7d_7	p7e_6	0.33	8
9	p7d_11	p7e_7	0.29	9
10	p7d_1	NA	NA	10
11	p7d_3	NA	NA	11
12	p7d_9	NA	NA	12

2.1.2 Full run

Further more, besides the marker gene list, if the Seurat object of sinlge dataset and combined dataset are provided, the full analysis can be done.

code

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS', p7 = 'pre_analysis/p7.RDS')
objList <- lapply(fList, readRDS)

single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'p7', single_obj_list = single_obj_list, comb_obj = objList$p7)
res

By calling "cluster_map" this way, Circos plot, re-colored t-SNE plot will be generated as following. As well as the cell percentage change and sample separability of each new group.

Circos plot and population size

To visualized the results, we use Circos plot to summarize.

The cords of the Circos plot indicate the linkage between the sub-groups.
The percentage of cell numbers in each sample is represented by the width of the black sectors. The sub-population size change is intuitive by comparing the sector size of matched (cord linked) groups.
The color of the cords indicates different new groups, which matches the recolored t-SNE plot.
The transparency of the cord color indicates the similarity of the matched groups.

p7.circos.png

Recolor

For easy comparison, each sample and combined sample are recolored by their new groups after matching under t-SNE coordinates in the pre-analysis. Cells belonging to the same new group in different samples are recolored in the same color. Thus, some samples may miss one or more colors, which means no such group in the sample. New groups will be saved as a list into RDS file, which can be used later.

p7d.recolor.tsne.png p7e.recolor.tsne.png

p7.recolor.tsne.png p7.color.by.sample.tsne.png

Separability

After matching and regrouping, we estimate the condition effect on each paired groups besides the percentage change. To quantify the difference between paired groups, separability was calculated for each new group. Separability will be generated pairwise if more than two samples.

For each new group, the separability is defined as the median difference of inna- and inter- sample distance of each cell in the combined t-SNE coordinates. The inna-sample distance is defined as the median distance of K-nearest cells of the same sample. Defualt K=5. The inter-sample distance is the same but cells of different sample.

Using median instead of mean will reduce the variation due to the outliners. The higher the more separable.

Map results

From the following results, we see that new group 2 and 4 are one to multiple match. There is no matched groups for the new group 8, 10, 11, 12 at current cutoff. Both circos plot and recolored t-SNE plot showed the cell percentage change of the group 2, 4 and 9. The separability of each new group further emerged the most affected sub-types, such as group 5, 7 and 4, besides the no matched groups.

The full results were saved as in file p7.results.csv

n	p7d	p7e	similarity	regroup	p7d_cell_perc	p7e_cell_perc	p7d.vs.p7e_separability
1	p7d_4	p7e_1	0.65	1	0.11	0.18	0.85
2	p7d_2	p7e_3;p7e_4	0.5	2	0.14	0.14;0.1	0.34
3	p7d_5	p7e_0;p7e_2	0.47	3	0.08	0.2;0.15	2.93
4	p7d_6	p7e_5	0.44	4	0.06	0.07	0.99
5	p7d_12	p7e_9	0.42	5	0.01	0.02	0.03
6	p7d_10	p7e_8	0.4	6	0.02	0.04	10.56
7	p7d_0;p7d_8	NA	0.34	7	0.17;0.06	NA	Inf
8	p7d_7	p7e_6	0.33	8	0.06	0.05	7.13
9	p7d_11	p7e_7	0.29	9	0.01	0.05	0.53
10	p7d_1	NA	NA	10	0.15	NA	Inf
11	p7d_3	NA	NA	11	0.11	NA	Inf
12	p7d_9	NA	NA	12	0.02	NA	Inf

2.1.3 Known markers

Core basal genes: Krt14, Acta2, Myl9, Sparc, Mylk, Cxcl14 – group 3, 4, 7, 8
Luminal genes: Elf5, Prlr, Areg, Ly6d, Stc2, Krt19 – group 1, 11, 12, 9
ML cell-specific genes: Prlr, Cited1, Esrrb, and Cxcl15 – group 2, 5, 6, 10

code

library(Seurat)

FeaturePlot(object = objList$p7d, features.plot = c("Krt14", "Acta2", "Elf5", "Prlr", "Krt19","Pgr"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.p7d.png')

FeaturePlot(object = objList$p7e, features.plot = c("Krt14", "Acta2", "Elf5", "Prlr", "Krt19","Pgr"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.p7e.png')

P7D

feature.gene.p7d.png

P7E

feature.gene.p7e.png

2.1.4 Marker genes of new groups

code

library(Seurat)
new.group.list <- readRDS('p7.new.group.list.rds')
names(new.group.list)
## [1] "p7d"  "p7e"  "comb"

p7 <- objList$p7
p7@ident <- new.group.list$comb
TSNEPlot(object = p7, do.label = T, label.size = 6)

p7.markers <- FindAllMarkers(object = p7, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
write.csv(p7.markers, file = 'p7.new.group.markers.csv')

## p7.markers.10.vs.12 <- FindMarkers(object = p7, ident.1 = '10',ident.2 = '2', only.pos = F, min.pct = 0.25, min.diff.pct = 0.1, thresh.use = 0.25)
## write.csv(p7.markers.10.vs.12,file = 'p7.markers.10.vs.12.csv')

2.2 More options

To custermized the analysis, step by step analysis could be performed.

2.2.1 Try different singleton cutoff for cluster match.

The cutoff is the edge length cutoff for deciding if a singleton should be merged or not. We suggest the cutoff is better to be less than 0.2.

code

marker_file_list <- c(p7d = 'pre_analysis/p7d.markers.csv', p7e = 'pre_analysis/p7e.markers.csv')
## cutoff 0.1
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.1, output = 'p7')
## cutoff 0.15
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.15, output = 'p7_cutoff0.15')
mapRes ## results didn't change.
## cutoff 0.2
mapRes <- cluster_map_by_marker(marker_file_list, cutoff = 0.2, output = 'p7_cutoff0.2')
mapRes ## p7d_9 was grouped together with p7d_10 and p7e_8.

cutoff = 0.2

n	p7d	p7e	similarity	regroup
1	p7d_4	p7e_1	0.65	1
2	p7d_2	p7e_3;p7e_4	0.50	2
3	p7d_5	p7e_0;p7e_2	0.47	3
4	p7d_6	p7e_5	0.44	4
5	p7d_12	p7e_9	0.42	5
6	p7d_0;p7d_8	NA	0.34	6
7	p7d_7	p7e_6	0.33	7
8	p7d_11	p7e_7	0.29	8
9	p7d_10;p7d_9	p7e_8	0.24	9
10	p7d_1	NA	NA	10
11	p7d_3	NA	NA	11

2.2.2 Use cell number information to generate circos plot.

The cell numbers for each sub-group can be pulled out from Seruat object or provided by the user. cell_num_list is a list of vectors for each sample. Each vector contains the cell numbers for each sub-group of the sample.

code

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS')
objList <- lapply(fList, readRDS)
single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)

cell_num_list <- lapply(single_obj_list, function(obj) summary(obj@ident))

mapRes <- read.csv('p7.results.csv') ## This file was auto-saved when cluster_map_by_marker function was run.

circos_map(mapRes, cell_num_list, output = 'p7')
res <- add_perc(mapRes, cell_num_list)
res

Number of samples in circos plot

If there are more than 3 samples, the circos plot will become complicated. We can choose a subset of samples to plot. To achieve this, just remove extra samples from the mapRes and cell_num_list variables. Then use circos_map to plot a subset.

2.2.3 Change colors in circos plot

code

fList <- c(p7d = 'pre_analysis/p7d.RDS', p7e = 'pre_analysis/p7e.RDS')
objList <- lapply(fList, readRDS)
single_obj_list <- c(p7d = objList$p7d, p7e = objList$p7e)
cell_num_list <- lapply(single_obj_list, function(obj) summary(obj@ident))
mapRes <- read.csv('p7.results.csv')## This file was auto-saved when cluster_map_by_marker function was run.

library(RColorBrewer)
circos_map(mapRes, cell_num_list, output = 'p7.tmp', color_cord = brewer.pal(n = 12, 'Paired'))

p7.tmp.circos.png

2.2.4 Recolor one single sample in t-SNE plot

code

p7d <- readRDS('pre_analysis/p7d.RDS')
mapRes <- read.csv('p7.results.csv') ## This file was auto-saved when cluster_map_by_marker function was run.
mapRes

da <- structure(as.vector(mapRes[, 'p7d']), names = mapRes$regroup)
head(da)
p7d.new.group <- recolor_s(da, obj = p7d, output = 'p7d') ## new group asignment of each cell is outputed as well.              
## custermized color
library(RColorBrewer)
p7d.new.group <- recolor_s(da, obj = p7d, output = 'p7d.tmp', color = brewer.pal(n = 12, 'Paired'))


p7e <- readRDS('pre_analysis/p7e.RDS')
da <- structure(as.vector(mapRes[, 'p7e']), names = mapRes$regroup)
head(da)
p7e.new.group <- recolor_s(da, obj = p7e, output = 'p7e') ## new group asignment of each cell is outputed as well.              
## custermized color
p7e.new.group <- recolor_s(da, obj = p7e, output = 'p7e.tmp', color = brewer.pal(n = 12, 'Paired'))

p7d.tmp.recolor.tsne.png p7e.tmp.recolor.tsne.png

2.2.5 Recolor combined sample in t-SNE plot

code

p7 <- readRDS('pre_analysis/p7.RDS')

new_group_comb <- recolor_comb(comb_obj = p7, new_group_list = list(p7d = p7d.new.group, p7e = p7e.new.group), output = 'p7', comb_delim = '-') ## new group list generated by recolor_s as in the previous code block.     

## custermized color
library(RColorBrewer)
new_group_comb <- recolor_comb(comb_obj = p7, new_group_list = list(p7d = p7d.new.group, p7e = p7e.new.group), output = 'p7.tmp', comb_delim = '-', color = brewer.pal(n = 12, 'Paired'))

p7.tmp.recolor.tsne.png

2.2.6 Adjust K for seperability

code

p7 <- readRDS('pre_analysis/p7.RDS')
tsne_coord <- as.data.frame(p7@dr$tsne@cell.embeddings)

new_group_list <- readRDS('p7.new.group.list.RDS')
group <- new_group_list$comb
head(group)

sample_label <- as.factor(sub('-.*', '', names(group)))
## k=5
sepa_k5 <- separability_pairwise(tsne_coord, group, sample_label, k = 5)
sepa_k5
## k=10
sepa_k10 <- separability_pairwise(tsne_coord, group, sample_label, k = 10)
sepa_k10

k = 5

regroup	p7d.vs.p7e
1	0.85
2	0.34
3	2.93
4	0.99
5	0.03
6	10.56
7	Inf
8	7.13
9	0.53
10	Inf
11	Inf
12	Inf

k = 10

regroup	p7d.vs.p7e
1	0.98
2	0.42
3	3.75
4	1.00
5	-0.38
6	11.43
7	Inf
8	7.71
9	0.99
10	Inf
11	Inf
12	Inf

3 More case study

3.1 Immune stimulated datasets

The dataset comes from this study: https://www.nature.com/articles/nbt.4042

A pool of peripheral blood mononuclear cells (PBMCs) from eight lupus patients was studied. PBMCs from each patient were untreated as control or activated with recombinant interferon-beta IFN-beta for 6 hours. 14,619 cells for control and 14,446 cells for stimulated were obtained from 10X Chromium instrument sequencing.

Pre-analysis by Seruat

ctrl.tsne.png stim.tsne.png

ClusterMap analysis

code

library(ClusterMap)

marker_file_list <- c(ctrl = 'pre_analysis/ctrl.markers.csv', stim = 'pre_analysis/stim.markers.csv')

fList <- c(ctrl = "pre_analysis/ctrl.RDS",
        stim = "pre_analysis/stim.RDS",
        comb = "pre_analysis/immune.RDS")
immuneList <- lapply(fList, readRDS)

single_obj_list <- c(ctrl = immuneList$ctrl, stim = immuneList$stim)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'immune', single_obj_list = single_obj_list, comb_obj = immuneList$comb)
res

Cluster match

immune.hcluster.png immune.heatmap.hcluster.png

Circos plot

immune.circos.png

Recolor

ctrl.recolor.tsne.png stim.recolor.tsne.png

immune.recolor.tsne.png immune.color.by.sample.tsne.png

Map results

immune.results.csv

n	ctrl	stim	similarity	regroup	ctrl_cell_perc	stim_cell_perc	ctrl.vs.stim_separability
1	ctrl_6	stim_6	0.6	1	0.04	0.04	0.24
2	ctrl_2	stim_2	0.53	2	0.12	0.11	0.55
3	ctrl_1	stim_1	0.5	3	0.18	0.21	0.78
4	ctrl_5	stim_4	0.49	4	0.08	0.07	5.01
5	ctrl_4	stim_3	0.48	5	0.09	0.11	0.55
6	ctrl_0;ctrl_8	stim_0;stim_11	0.45	6	0.31;0.02	0.28;0.01	11.03
7	ctrl_7	stim_8	0.39	7	0.03	0.03	7.05
8	ctrl_10	stim_10	0.34	8	0.01	0.01	0.55
9	ctrl_3	stim_5;stim_7	0.3	9	0.09	0.07;0.04	0.37
10	ctrl_9	NA	NA	10	0.02	NA	Inf
11	NA	stim_12	NA	11	NA	0	Inf
12	NA	stim_9	NA	12	NA	0.02	Inf

Cell type and Known markers

cM, CD14+CD16- monocytes: CD14, LYZ — group 6
ncM, CD14+CD16+ monocytes: FCGR3A, MS4A7 — group 4
DC, dendritic cells: FCER1A, CST3 — group 7
Mkc, megakaryocytes: PPBP — group 10, 12
Th, CD4+ T cells: IL7R — group 2, 3
B, B cells: MS4A1 — group 5
Tc, CD8+ T cells: CD8A — group 9
NK, natural killer cells: GNLY, NKG7 — group 9

code

library(Seurat)

FeaturePlot(object = immuneList$ctrl, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.ctrl.png')

FeaturePlot(object = immuneList$stim, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.stim.png')

## FeaturePlot(object = all, features.plot = c("CD4","TBX21",'GATA3','SELL','CREM'), cols.use = c("grey", "blue"), reduction.use = "tsne")
## savePlot('features.immune.png')

Control

feature.gene.ctrl.png

Stimulated

feature.gene.stim.png

3.2 PBMCs datasets

The dataset is downloaded from 10X Genomics single cell gene expression datasets:
https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k
https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k

They are peripheral blood mononuclear cells (PBMCs) from the same healthy donor.

Pre-analysis

pbmc4k.tsne.png pbmc8k.tsne.png

ClusterMap analysis

code

library(ClusterMap)

marker_file_list <- c(pbmc4k = 'pre_analysis/pbmc4k.markers.csv', pbmc8k = 'pre_analysis/pbmc8k.markers.csv')

fList <- c(pbmc4k = 'pre_analysis/pbmc4k.RDS',
        pbmc8k = 'pre_analysis/pbmc8k.RDS',
        pbmc = "pre_analysis/pbmc.RDS")
pbmcList <- lapply(fList, readRDS)

single_obj_list <- c(pbmc4k = pbmcList$pbmc4k, pbmc8k = pbmcList$pbmc8k)

res <- cluster_map(marker_file_list, edge_cutoff = 0.1, output = 'pbmc', single_obj_list = single_obj_list, comb_obj = pbmcList$pbmc)
res

## marker genes
library(Seurat)

FeaturePlot(object = pbmcList$pbmc4k, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.pbmc4k.png')

FeaturePlot(object = pbmcList$pbmc8k, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
savePlot('feature.gene.pbmc8k.png')

## FeaturePlot(object = pbmcList$pbmc, features.plot = c("CD14","LYZ","FCGR3A","MS4A7","FCER1A","CST3","PPBP","IL7R","MS4A1","CD8A","GNLY","NKG7"), cols.use = c("grey", "blue"), reduction.use = "tsne")
## savePlot('feature.gene.pbmc.png')

Cluster match

pbmc.hcluster.png pbmc.heatmap.hcluster.png

Circos plot

The cord color in the circos plot is much darker, indicating the high similarity of the matched groups.

pbmc.circos.png

Recolor

pbmc4k.recolor.tsne.png pbmc8k.recolor.tsne.png

pbmc.recolor.tsne.png pbmc.color.by.sample.tsne.png

Map results

Clear one-to-one match. Majority of the cell percentage of sub-types are not changed. The separability shows that there is no obvious difference between the two samples in any of the matched pairs.

pbmc.results.csv

n	pbmc4k	pbmc8k	similarity	regroup	pbmc4k_cell_perc	pbmc8k_cell_perc	pbmc4k.vs.pbmc8k_separability
1	pbmc4k_4	pbmc8k_6	0.83	1	0.1	0.09	-0.09
2	pbmc4k_6	pbmc8k_4	0.82	2	0.08	0.11	0.03
3	pbmc4k_2	pbmc8k_5	0.82	3	0.13	0.1	-0.01
4	pbmc4k_0	pbmc8k_0	0.79	4	0.19	0.16	-0.02
5	pbmc4k_3	pbmc8k_2	0.76	5	0.12	0.12	-0.08
6	pbmc4k_5	pbmc8k_3	0.76	6	0.09	0.12	-0.07
7	pbmc4k_10	pbmc8k_9	0.74	7	0.03	0.03	-0.08
8	pbmc4k_9	pbmc8k_10	0.71	8	0.03	0.03	-0.01
9	pbmc4k_8	pbmc8k_7	0.71	9	0.04	0.06	0.05
10	pbmc4k_1	pbmc8k_1	0.68	10	0.13	0.12	-0.08
11	pbmc4k_7	pbmc8k_8	0.66	11	0.04	0.06	0.05
12	pbmc4k_11	pbmc8k_12	0.61	12	0.01	0.01	-0.08
13	pbmc4k_12	pbmc8k_11	0.48	13	0.01	0.01	-0.02

Known markers

PBMC4K

feature.gene.pbmc4k.png

PBMC8K

feature.gene.pbmc8k.png

4 Simulated data

We generate a random tree with 4 samples and 10 sub-groups in each sample to test our tree cut algorithm.

code

library(ape)
set.seed(1) 
tree <- rcoal(40, tip.label = paste0(c('A', 'B', 'C', 'D'), '_', rep(1:10, each = 4)), br = runif(80, min = 0, max = 0.02))
## plot(tree, edge.width = 2, label.offset = 0.01)
## nodelabels()
## tiplabels()
## axisPhylo(side = 1)
saveRDS(tree, file = 'simulated.tree.RDS')

hc = as.hclust.phylo(tree)

png('simulated.tree.png')
plot(hc)
dev.off()

res <- purity_cut(hc, cutoff = 0.1)
write.csv(res, file = 'simulated.c0.1.results.csv')

res <- purity_cut(hc, cutoff = 0.2)
write.csv(res, file = 'simulated.c0.2.results.csv')

simulated.tree.png

cutoff = 0.1

n	A	B	C	D	similarity	regroup
1	NA	B_3	C_7	NA	0.99	1
2	A_9;A_5	NA	NA	NA	0.96	2
3	A_6	NA	NA	D_4	0.93	3
4	A_1	B_8	NA	D_10	0.88	4
5	NA	NA	C_6;C_5	NA	0.84	5
6	NA	B_2	NA	D_1	0.81	6
7	A_4	B_1	C_9;C_8	NA	0.8	7
8	A_10	B_5	NA	NA	0.77	8
9	NA	B_4	C_4	NA	0.72	9
10	NA	B_9;B_10	NA	NA	0.68	10
11	NA	NA	C_10	D_2	0.55	11
12	A_8	B_6	C_3;C_1	NA	0.43	12
13	NA	NA	NA	D_9	NA	13
14	NA	NA	NA	D_6	NA	14
15	NA	NA	NA	D_7	NA	15
16	A_7	NA	NA	NA	NA	16
17	NA	NA	NA	D_5	NA	17
18	NA	NA	NA	D_3	NA	18
19	NA	NA	NA	D_8	NA	19
20	A_2	NA	NA	NA	NA	20
21	A_3	NA	NA	NA	NA	21
22	NA	NA	C_2	NA	NA	22
23	NA	B_7	NA	NA	NA	23

cutoff = 0.2

n	A	B	C	D	similarity	regroup
1	NA	B_3	C_7	NA	0.99	1
2	A_9;A_5	NA	NA	NA	0.96	2
3	A_6	NA	NA	D_4	0.93	3
4	NA	NA	C_6;C_5	NA	0.84	4
5	A_4	B_1	C_9;C_8	NA	0.8	5
6	NA	B_4	C_4	NA	0.72	6
7	A_1	B_9;B_10;B_8	NA	D_5;D_10;D_3	0.61	7
8	A_10	B_5	NA	D_9	0.6	8
9	NA	B_2	C_10	D_6;D_1;D_7;D_2	0.51	9
10	A_8	B_6	C_3;C_1	NA	0.43	10
11	A_7	NA	NA	NA	NA	11
12	NA	NA	NA	D_8	NA	12
13	A_2	NA	NA	NA	NA	13
14	A_3	NA	NA	NA	NA	14
15	NA	NA	C_2	NA	NA	15
16	NA	B_7	NA	NA	NA	16

R version 3.4.3 (2017-11-30)

Seurat_2.2.1

ClusterMap - Compare multiple Single cell RNA-seq profiling

Table of Contents

1 Introduction

2 Guide

2.1 Quick start

2.1.1 Simple run

2.1.2 Full run

2.1.3 Known markers

2.1.4 Marker genes of new groups

2.2 More options

2.2.1 Try different singleton cutoff for cluster match.

2.2.2 Use cell number information to generate circos plot.

2.2.3 Change colors in circos plot

2.2.4 Recolor one single sample in t-SNE plot

2.2.5 Recolor combined sample in t-SNE plot

2.2.6 Adjust K for seperability

3 More case study

3.1 Immune stimulated datasets

3.2 PBMCs datasets

4 Simulated data