BioXpress Pipeline v-5.0 README

Last Updated August 2021 by Ned Cauley

Description

The BioXpress pipeline takes raw count data from TCGA studies for both Primary Tumor and Normal Tissue and performs differential expression.

The TCGA studies included in the BioXpress pipeline are (by tissue):

Bladder - TCGA-BLCA (Bladder urothelial carcinoma)
Breast - TCGA-BRCA (Breast invasive carcinoma)
Colorectal - TCGA-COAD (Colon adenocarcinoma) - TCGA-READ (Rectum adenocarcinoma)
Esophageal - TCGA-ESCA (Esophageal carcinoma)
Head and Neck - TCGA-HNSC (Head and Neck squamous cell carcinoma)
Kidney - TCGA-KICH (Kidney Chromophobe) - TCGA-KIRP (Kidney renal papillary cell carcinoma) - TCGA-KIRC (Kidney renal clear cell carcinoma)
Liver - TCGA-LIHC (Liver hepatocellular carcinoma)
Lung - TCGA-LUAD (Lung adenocarcinoma) - TCGA-LUSC (Lung squamous cell carcinoma)
Prostate - TCGA-PRAD (Prostate adenocarcinoma)
Stomach - TCGA-STAD (Stomach adenocarcinoma)
Thyroid - TCGA-THCA (Thyroid carcinoma)
Uterine - TCGA-UCEC (Uterine Corpus Endometrial Carcinoma)

Running the Pipeline

To run the BioXpress pipeline, you need to download the scripts from the HIVE Lab github repo: GW HIVE Backend Code Repository If running Bioxpress on a HIVE Lab server (such as glygen-vm-dev), place scripts in your user folder /home/$yourusername/. Data and other output from the scripts is stored in /data/projects/bioxpress/.

Pipeline Overview

Step 1: Downloader

The downloader step will use sample sheets obtained from [GDC Data Portal](https://portal.gdc.cancer.gov/repository) to download raw counts from RNA-Seq for Primary Tumor and Normal Tissue in all available TCGA Studies.

Index for downloader: