When you are trying to understand a new analysis method, a good strategy is to
Today we will go through steps 1 and 2. For an introduction to RNAseq analysis we will discuss this Galaxy workflow. The steps involving read mapping, transcript quantification, and read counting are usually done on a Unix computer using the command line or an interface to a Unix computer such as Galaxy. We will go through these steps in the tutorial and then in a later lab on a Unix virtual machine or the MGHPCC. Today in lab we will work on differential expression analysis in R.
Bioconductor is an open source, open development software project for genomic data analysis. It is based primarily on the R programming language. There are nearly 1,000 R software packages available and an active user community. R/Bioconductor is one of the primary mechanisms for publishing new genomic data analysis tools. We will use numerous Bioconductor packages in this course. Bioconductor is also available for cloud computing on Amazon and other platforms.
You can install Bioconductor by typing in the R console
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")Individual packages that are not part of the BioConductor core can be installed using
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(name_of_package)This lab will go through differential expression analysis in R using DESeq2 along with other Bioconductor and core R packages.
The data used in this workflow is an RNA-Seq experiment of airway smooth muscle cells treated with dexamethasone, a synthetic glucocorticoid steroid with anti-inflammatory effects. Glucocorticoids are used, for example, in asthma patients to prevent or reduce inflammation of the airways. In the experiment, four primary human airway smooth muscle cell lines were treated with 1 micromolar dexamethasone for 18 hours. For each of the four cell lines, we have a treated and an untreated sample. The reference for the experiment is:
Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. “RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells.” PLoS One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
To install the packages and data needed to complete this tutorial on your computer
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rnaseqGene")
# I also needed to install the following on my computer
# (You will realize this if you get an error message that the corresponding library will not load)
BiocManager::install("airway")
BiocManager::install("tximeta")
BiocManager::install("DESeq2")
BiocManager::install("Gviz")
BiocManager::install("sva")
BiocManager::install("RUVSeq")
BiocManager::install("fission")We will go step wise through a Bioconductor workflow vignette RNA-seq workflow: gene-level exploratory analysis and differential expression. We will start by opening the html file associated with the workflow and downloading the R script to our computer.
tidyverse packages ggplot2 and dyplr, but also makes use of core R plotting packages.