pyVIPER (VIPER Analysis in Python for single-cell RNASeq)
This package enables network-based protein activity estimation on Python. It provides also interfaces for scanpy (single-cell RNASeq analysis in Python). Functions are partly transplanted from R package viper and the R package NaRnEA.
The user-friendly documentation is available here.
Dependencies
scanpyfor single cell pipeline.pandasandanndatafor data computing and storage.numpy,scipy>=1.10.0 andstatsmodelfor scientific computation and statistical inference.joblibfor parallel computingloompyandpyarrowfor Loom file format support and efficient data serialization and I/Otqdmor progress bar visualizationigraphfor data visualizationleidenalgfor Leiden clusteringtorchfor GPU processing
If you are using a version of scanpy <1.9.3, it is also advisable to downgrade pandas to (>=1.3.0 & <2.0), due to scanpy incompatibility (issue)
Installation
pypi
pip install viper-in-python
local
git clone https://github.com/alevax/pyviper/
cd pyviper
pip install -e .
Usage
import pandas as pd
import anndata
import pyviper
# Load sample data
ges = anndata.read_text("test/unit_tests/test_1/test_1_inputs/LNCaPWT_gExpr_GES.tsv").T
# Load network
network = pyviper.load.msigdb_regulon("h")
# Translate sample data from ensembl to gene names
pyviper.pp.translate(ges, desired_format = "human_symbol")
## Filter targets in the interactome
network.filter_targets(ges.var_names)
# Compute regulon activities
## area
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="area")
print(activity.to_df())
## narnea
activity = pyviper.viper(gex_data=ges, interactome=network, enrichment="narnea", eset_filter=False)
print(activity.to_df())
Tutorials
Structure and rationale
The main functions available from pyviper are:
pyviper.viper: “pyviper” function for Virtual Inference of Protein Activity by Enriched Regulon Analysis (VIPER). The function allows using 2 enrichment algorithms, aREA and (matrix)-NaRnEA (see below).pyviper.aREA: computes aREA (analytic rank-based enrichment analysis) and meta-aREApyviper.NaRnEA: computes matrix-NaRnEA, a vectorized, implementation of NaRnEApyviper.pp.translate: for translating between species (i.e. mouse vs human) and between ensembl, entrez and gene symbols.pyviper.tl.path_enr: computes pathway enrichment
Other notable functions include:
pyviper.tl.OncoMatch: computes OncoMatch, an algorithm to assess the activity conservation of MR proteins between two sets of samples (e.g. validate GEMMs as effective models of human samples)pyviper.pp.stouffer: computes signatures on a cluster-by-cluster basis using Cluster integration method for pathway enrichmentpyviper.pp.viper_similarity: computes the similarity between VIPER signaturespyviper.pp.repr_metacells: compute representative metacells (e.g. for ARACNe) using our method to maximize unique sample usage and minimize resampling (users can specify depth, percent data usage, etc).pyviper.pp.repr_subsample: select a representative subsample of data using our method to ensure a widely distributed sampling.
Additionally, the following submodules are available:
pyviper.load: submodule containing several utility functions useful for different analyses, includingload_msigdb_regulon,load_TFsetcpyviper.pl: submodule containing pyviper-wrappers forscanpyplottingpyviper.tl: submodule containing pyviper-wrappers forscanpydata transformationpyviper.config: submodule allowing users to specify current species and filepaths for regulators
Last, a new Interactome class allows users to load and interrogate ARACNe- and SCENIC-inferred gene regulatory networks.
Contact
Please, report any issues that you experience through this repository “Issues”.
For any other info or queries please write to Alessandro Vasciaveo (av2729@cumc.columbia.edu)
License
[!IMPORTANT]
pyviperprovides a Python-based implementation of the VIPER software. The VIPER software is distributed by Columbia University under a non-commercial, academic-only evaluation license, which restricts its use to non-profit or not-for-profit organizations and prohibits any commercial use, redistribution, or sublicensing without a separate commercial agreement with Columbia University’s Science and Technology Ventures office. All terms and conditions are specified in the accompanying LICENSE file.
Citation
If you used pyVIPER in your publication, please cite our work here:
Wang, A.L.E., Lin, Z., Zanella, L., Vlahos, L., Girotto, M.A., Zafar, A., … & Vasciaveo, A. (2024). pyVIPER: A fast and scalable Python package for rank-based enrichment analysis of single-cell RNASeq data. bioRxiv, 2024-08. doi: https://doi.org/10.1101/2024.08.25.609585.
Manuscript in review
Contents
Tutorials
- Tutorial 1 - Analyzing scRNA-seq data at the Protein Activity Level
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression signature for single-cells
- Step 2. Load an inspect a lineage-specific gene regulatory network
- Step 3. Convert the gene expression signature into a protein activity matrix using VIPER
- Step 4. Analyze single-cells at the Protein Activity level
- Key takeaways
- Tutorial 2 - Inferring Protein Activity from scRNA-seq data from multiple cell populations with the meta-VIPER approach
- Install PyVIPER
- Import modules
- Step 1. Load a gene expression matrix and associated metadata
- Step 2. Preprocess and generate a gene expression signature at the single-cell level
- Step 3. Load multiple ARACNe-inferred gene regulatory networks
- Step 4. Analyze single-cells at the Protein Activity level
- Pathway enrichment analysis
- Key takeaways
- Tutorial 3 - Generating metacells for reverse-engineering of ARACNe gene regulatory networks