Doublet Detection Exercise

Here, we will explore how to identify and remove doublets (two cells captured in one droplet) from single-cell RNA-seq data using computational methods.

Dataset

We will work with the 10x Genomics 10k PBMCs from a Healthy Donor (v3 chemistry). The higher cell count increases doublet probability (~8% expected doublet rate).

Part 1: Initial Data Processing

Step 1: Load and explore the dataset

Step 2: Standard preprocessing

Step 3: Process without doublet removal

Part 2: Doublet Detection

For Seurat users:

Install scDblFinder and use it according to the package documentation.

For scanpy users:

Use scrublet via the built-in scanpy wrapper:

Step 4: Analyze doublet predictions

Part 3: Biological Validation

Step 1: Check for heterotypic doublets

Look for cells expressing mutually exclusive markers:

Create feature plots:

Step 2: Examine clustering patterns

Part 4: Remove Doublets and Re-analyze

Step 1: Filter doublets

Step 2: Re-process cleaned data

Step 3: Compare before and after

Create side-by-side comparisons:

  1. Number of cells retained
  2. UMAP plots (before/after)
  3. Number of clusters
  4. Marker gene expression clarity

Part 5: Critical Evaluation

Discuss with a neighbor:

  1. What percentage of cells were identified as doublets?
  2. Were doublets randomly distributed or concentrated in specific areas?
  3. Did doublet removal reveal any new cell populations?
  4. Did any clusters disappear after doublet removal?
  5. How did doublet scores correlate with QC metrics (nUMIs, nGenes)?