Methods

Single-Cell Vault

Curated and Pre-processed Database for Paired scRNA/TCRseq

Abstract

Despite the crucial role of T-cell clones in anti-tumor activity, their characterization and association with clinical outcome following immune checkpoint inhibitors is lacking.

Here we analyzed paired single-cell RNA-sequencing/T-cell receptor sequencing of 767,606 T cells across 460 samples spanning 6 cancer types.

We found a robust signature of response based on expanded CD8⁺ clones that differentiates responders from non-responders.

Analysis of persistent clones showed transcriptional changes that are differentially induced by therapy in the different response groups, suggesting an improved reinvigoration capacity in responding patients.

Moreover, a gene trajectory analysis revealed changes in the pseudo-temporal state of de-novo clones that are associated with clinical outcome.

Lastly, we found that clones shared between tumor and blood are more abundant in non-responders and execute distinct transcriptional programs.

Overall, our results highlight differences in clonal transcriptional states that are linked to patient response, offering valuable insights into the mechanisms driving effective anti-tumor immunity.

Description of the datasets found in the website

This repository contains paired scRNA/TCRseq datasets of cancer patients treated with ICI-based therapy following our own quality control (QC) as described in our manuscript.

These datasets contain only single cells having both scRNAseq and scTCRseq that passed our QC.

Each dataset contains two files:

1. The raw count matrix (scRNAseq file) as an h5ad object, including the metadata and clinical information at the single-cell level.

2. The raw scTCRseq file as a csv.gz object.

The raw count file was originally utilized by us using Scanpy version 1.9.4, and can be easily uploaded using the function 'scanpy.read()' for further analysis.

The raw scTCRseq file was originally utilized by us using Scirpy version 0.13.0, and can be directly uploaded using the function 'scirpy.io.read_10x_vdj()' for further analysis.

Please note that each count matrix contains 12,407 genes that passed our QC and existed across all datasets.

Please be aware of our annotations to clinical outcome (response), as elaborated in our manuscript. These annotations were converted from the original RECIST criteria or pathological response provided by each study, to be 'R' / 'NR'. In the breast cancer datasets of Bassez et al., no clinical outcome was provided but rather patient-level annotations for clonal expansion ('E' / 'NE'). Therefore, this specific study was not considered in analyses regarding clinical outcome.

In order to analyze the full list of genes for each dataset, as well as single cells not having scTCRseq, please refer the the original study of each dataset.

Technical & study-specific information

General technical information -

Please note that scTCRseq datasets were provided for each dataset in different formats. We therefore rearranged each file to be compatible for being uploaded using the 'scirpy.io.read_10x_vdj()' function. This rearrangement resulted with columns that did not originally exist in all datasets and were filled with either null/'True' values for compatibility purposes.
Please note that scirpy considers only productive TCRs and ignores the 'full_length' column provided in the scTCRseq data. Therefore null values in the 'full_length' column will not affect the scTCRseq analysis using scirpy.
For datasets having both tumor and blood samples, we provided both tissue types in the same file with annotations for the tissue type of origin for each single cell.
The datasets are ready for further integration according to your own pipeline, or according to our pipeline as described in our manuscript.

Luoma et al. (HNSCC) -

For convenience, treatment timepoint for each blood sample is annotated as originally provided by the authors, considering the three timepoints described in the original manuscript (B1, B2 & B3).

Krishna et al. (RCC) -

CDR3 was originally provided only based on amino-acid sequence.
The original count matrix was provided following exclusion of mitochondrial genes, while retaining cells with <20% of mitochondrial gene-count. Because of the absence of these genes, the percent of mitochondrial gene-count in this dataset was set to zero.

Yost et al. (SCC & BCC) -

scTCRseq originally included only CDR3 sequences (amino-acid and nucleotide sequence) without additional components.
BCC and SCC datasets were provided in separate files.

Bassez et al. (BC) -

scTCRseq originally included only CDR3 sequences (amino-acid and nucleotide sequence) without additional components.
This study contained two different cohorts that were provided in separate files.
No clinical outcome, but rather annotations for patient-level clonal expansion as provided by the authors.
Please note that this study contained both HER2 positive, as well as ER positive and triple-negative breast cancer patients - please refer to the original study for more information.

Shiao et al. (TNBC) -

Pay attention that several patients have more than one "post" sample with varying treatment regimen. Please refer to the original paper for more information.

Citation

If you use the datasets from this website in your research, please consider citing our paper together with the original study of each dataset.

The figures used under the "Home" and "Methods" tabs were created with BioRender.com using a paid license.

Technion accessibility statement