Dataset Integration Exercise

We will integrate two PBMC datasets from different 10x Chromium chemistry versions to observe and correct batch effects.

Part 1: Data Acquisition

  1. Navigate to the 10x Genomics dataset page
  2. Download these two PBMC datasets (Note: you need to enter some information including an email address prior to being able to download data):
  3. Download the “Feature / cell matrix (filtered)” for each dataset

Part 2: Initial Processing Without Integration

Step 1: Load and combine datasets

Step 2: Standard preprocessings

Step 3: Clustering and visualization

Part 3: Batch Correction

For Seurat users:

For scanpy users:

Part 4: Post-Integration Analysis

Step 1: Re-process integrated data

Step 2: Quantitative evaluation

Calculate the integration Local Inverse Simpson’s Index (iLISI):

Part 5: Comparison & Discussion

Discuss with a neighbor:

  1. How did cells cluster before vs. after integration?
  2. Are cell types now mixed across batches?
  3. What is your iLISI score? Does it support your visual assessment?
  4. Can you identify the major PBMC cell types after integration?

Bonus Challenges