A Guide to Trajectory Inference and RNA Velocity Analysis

Trajectory inference and RNA velocity analysis are powerful techniques used to understand cellular dynamics based on single-cell gene expression data. These methods allow researchers to reconstruct the dynamic processes cells undergo, such as differentiation, maturation, and responses to stimuli. RNA velocity, in particular, leverages the information from unspliced and spliced transcript levels to predict a cell’s future state, enhancing the accuracy of inferred trajectories. This guide provides a comprehensive overview of how to perform trajectory inference and RNA velocity analysis, focusing on integrating data from Cell Ranger and Loupe Browser with community-developed tools.

Software and Data Requirements

To perform the analysis described in this guide, you will need the following software packages:

Essential:

Python (3.7 or higher)
Scanpy
ScVelo
Velocyto
Loompy
AnnData
Jupyter Notebook

Highly Recommended:

A Linux server with sufficient computing resources (RAM/CPUs) to run the velocyto pipeline efficiently. This pipeline is computationally intensive.
Jupyter Notebook for interactive data exploration and analysis.

Optional:

Single-cell gene expression data generated by Cell Ranger. If you don’t have your own data, you can use the neutrophil dataset as an example. You can generate the required Cell Ranger output files by following the Capturing Neutrophils in 10x Single Cell Gene Expression Data tutorial.

For reference, key intermediate files and expected outputs are provided in the original article.

Step-by-Step Guide to Trajectory Inference and RNA Velocity

This guide walks you through the process of performing trajectory inference and RNA velocity analysis using data processed with Cell Ranger and Loupe Browser.

Step 1: Exporting Projections and Clusters from Loupe Browser

The initial step involves exporting relevant data from Loupe Browser. If your dataset contains multiple cell types, you should first annotate the cell types of interest. In this example, we focus on neutrophils. Therefore, you need to export the UMAP embeddings and cluster assignments specifically for neutrophils.

Detailed instructions on raw data processing and cell type annotation can be found in the Tutorial: Capturing Neutrophils in 10x Single Cell Gene Expression Data. Exporting projections and clusters is described here.

Step 2: Preparing the Conda Environment

A dedicated conda environment is highly recommended to manage the required software dependencies. Conda simplifies the installation and management of Python packages, ensuring compatibility and reproducibility.

Install Conda: If you don’t have conda installed, follow the installation instructions for your operating system from the official conda documentation.
Create a Conda Environment: Download the provided YAML file, which lists the necessary packages. Execute the following command in your terminal, in the same directory as the downloaded YAML file:
```
conda env create --file TutorialEnvironment.yml
```
This command creates a new conda environment named “tutorial” (or whatever name you specify in the YAML file) with all the required packages installed.

Step 3: Running Velocyto for Spliced/Unspliced Transcript Quantification

Velocyto is a powerful tool for quantifying spliced and unspliced transcripts from single-cell RNA sequencing data. This information is crucial for estimating RNA velocity.

Activate the Conda Environment:
```
conda activate tutorial
```
Run Velocyto: Velocyto requires two inputs: the Cell Ranger output directory and a genome annotation file (GTF). Ensure the GTF file matches the reference used in Cell Ranger. The reference path can usually be found in the Web Summary Sample section of the Cell Ranger output. Execute the following command, replacing the placeholders with the correct paths:
```
velocyto run10x /path/to/cellranger-runs/sample_name /path/to/refdata-gex-GRCh38-2020-A/genes/genes.gtf
```
Velocyto is resource-intensive and can take a significant amount of time to complete. A high-performance computing (HPC) cluster or a workstation with ample RAM and CPU cores is recommended.

Upon successful completion, Velocyto generates a .loom file (e.g., velocyto/sample_name.loom) within the Cell Ranger output directory. This file contains the spliced and unspliced transcript counts.

If you prefer to skip this step, a pre-processed loom file is available for download.

Step 4: Performing RNA Velocity Analysis with scVelo

ScVelo is a scalable toolkit designed for RNA velocity analysis. It leverages splicing kinetics to infer cell state transitions and reconstruct developmental trajectories.

Download Input Files (Optional): If you skipped Step 3, download the required input files:

mkdir input-files
curl -o input-files/filtered_feature_bc_matrix.tar.gz https://cf.10xgenomics.com/supp/cell-exp/neutrophils/filtered_feature_bc_matrix.tar.gz
curl -o input-files/WB_Lysis_3p_Introns_8kCells.loom https://cf.10xgenomics.com/supp/cell-exp/neutrophils/WB_Lysis_3p_Introns_8kCells.loom
curl -o input-files/3p-Neutrophils-clusters.csv https://cf.10xgenomics.com/supp/cell-exp/neutrophils/3p-Neutrophils-clusters.csv
curl -o input-files/3p-Neutrophils-UMAP-Projection.csv https://cf.10xgenomics.com/supp/cell-exp/neutrophils/3p-Neutrophils-UMAP-Projection.csv
tar -xvzf input-files/filtered_feature_bc_matrix.tar.gz -C input-files/
ls -lahR input-files

These commands create a directory named input-files, download the necessary files, and extract the gene expression matrix.

Open Jupyter Notebook and Import Libraries: Open the provided Jupyter notebook file in Jupyter Notebook. Import the required Python libraries:
```
import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
import scanpy as sc
import igraph
import scvelo as scv
import loompy as lmp
import anndata
import warnings
warnings.filterwarnings('ignore')
```
If you encounter a “ModuleNotFoundError,” ensure that all required packages are installed in your conda environment.

Import Matrix: Load the filtered gene expression matrix from Cell Ranger into scVelo:

# Define Path to the matrix output from cellranger
Path10x='./input-files/filtered_feature_bc_matrix/'

# Read matrix
Neutro3p = sc.read_10x_mtx(Path10x,var_names='gene_symbols',cache=True)

# Print information on this new object
Neutro3p

This creates an AnnData object named Neutro3p containing the gene expression data.

Import Clusters and Projection: Import the cluster assignments and UMAP embeddings exported from Loupe Browser:

# Read clusters
Clusters_Loupe = pd.read_csv("./input-files/3p-Neutrophils-clusters.csv", delimiter=',',index_col=0)

# Create a list with only neutrophil Barcodes, because the dataset included other cell types.
Neutrophils_BCs=Clusters_Loupe.index

# Read UMAP embeddings
UMAP_Loupe = pd.read_csv("./input-files/3p-Neutrophils-UMAP-Projection.csv", delimiter=',',index_col=0)

# Select neutrophil barcodes in the projection
UMAP_Loupe=UMAP_Loupe.loc[Neutrophils_BCs,]

# Tansform to Numpy
UMAP_Loupe = UMAP_Loupe.to_numpy()

# Filter cells for only neutrophils
Neutro3p=Neutro3p[Neutrophils_BCs]

# Import cluster information into Neutro3p object
Neutro3p.obs['Loupe']=Clusters_Loupe

# Import UMAP information into Neutro3p object
Neutro3p.obsm["X_umap"]=UMAP_Loupe

# Print information about the object
Neutro3p

This step integrates the Loupe Browser annotations into the AnnData object.

Import Spliced/Unspliced Counts: Import the .loom file generated by Velocyto, merging the spliced and unspliced transcript counts into the Neutro3p object:

# Read velocyto output
VelNeutro3p = scv.read('./input-files/WB_Lysis_3p_Introns_8kCells.loom', cache=True)

# Merge velocyto with cellranger matrix
Neutro3p = scv.utils.merge(Neutro3p, VelNeutro3p)

Run Velocity Analysis: Perform the core velocity analysis using scVelo. This involves preprocessing, moment estimation, velocity calculation, and latent time recovery:

# Standard scvelo processing to run Dynamical Mode
scv.pp.filter_and_normalize(Neutro3p, min_shared_counts=30, n_top_genes=2000)
scv.pp.moments(Neutro3p, n_pcs=30, n_neighbors=30)

# Estimate RNA velocity and latent time
scv.tl.recover_dynamics(Neutro3p)
scv.tl.velocity(Neutro3p, mode='dynamical')
scv.tl.velocity_graph(Neutro3p)
scv.tl.recover_latent_time(Neutro3p)

Visualization: Visualize the results using various scVelo plotting functions:

Velocity Stream Plot Colored by Cluster:

# Generate plot with UMAP and cluster
scv.pl.velocity_embedding_stream(Neutro3p,basis="umap",color="Loupe",title='Neutrophils',fontsize=20,legend_fontsize=20,min_mass=2,palette=Colorss,save='scVelo-umap-cluster.png')

Velocity Stream Plot Colored by Latent Time:

# Generate plot with UMAP and latent time
scv.pl.velocity_embedding_stream(Neutro3p,basis="umap",color="latent_time",title='Neutrophils',fontsize=20,legend_fontsize=20,min_mass=2,color_map="plasma",save='scVelo-umap-latent_time.png')

Violin Plot of Latent Time per Cluster:

sc.pl.violin(Neutro3p, keys='latent_time',groupby="Loupe",order=["Cluster 4","Cluster 1","Cluster 5","Cluster 6","Cluster 7","Cluster 3","Cluster 2"],save='scVelo-violin-latent_time.png')

Heatmap of Neutrophil Maturation Genes Ordered by Latent Time:

Genes=["RETN","LTF","CAMP","ACTB","GCA","LCN2", "S100A8","MYL6","S100A9","FCGR3B","S100A11","FTH1","IFIT1", "IFITM3","IFIT3","ISG15","IFIT2","RPS9","NEAT1","MALAT1","NFKBIA","CXCL8"]
scv.pl.heatmap(Neutro3p, var_names=Genes, sortby='latent_time', col_color='Loupe', n_convolve=100,figsize=(16,8),yticklabels=True,sort=True,colorbar=True,show=True,layer="count",save='scVelo-heatmap-latent_time.png')

The heatmap visualizes gene expression changes along the inferred trajectory, revealing patterns associated with cell maturation.

Alternative: Importing a Seurat Object

If you have pre-processed your data using Seurat, you can import your Seurat object into scVelo for trajectory analysis.

Save Seurat Object in R: In R, after completing standard Seurat processing, save the Seurat object as a .loom file:

## This step is done in R after Seurat processing
Neutro3p.loom <- as.loom(Neutro3p, filename = "/path/to/Neutro3pSeurat.loom", verbose = FALSE)
Neutro3p.loom$close_all()

Load Loom File in scVelo: In the scVelo notebook, replace the sc.read_10x_mtx function call in Step 4.3 with scv.read_loom:
```
Neutro3p = scv.read_loom("./Neutro3pSeurat.loom")
```
The rest of the scVelo analysis steps remain the same.

Conclusion

This guide provides a detailed walkthrough of trajectory inference and RNA velocity analysis using scVelo, integrating data from Cell Ranger and Loupe Browser. The same workflow can be applied to other cell types to study their dynamic behavior.

While this guide focuses on velocyto and scVelo, other community-developed tools are available for trajectory inference and velocity analysis. Further exploration of these tools can provide complementary insights.