PCA: biomedical data visualization in R
This module is designed to demonstrate the utility of multivariate analysis methods that help with complex biomedical data visualization in R using RNA-seq as an example. The tutorial will cover the following topics:
- Need for dimensionality reduction techniques for high throughput biomedical data like RNA-seq
- The utility of principal component analysis for biomedical data visualization in R
- Demonstration of the R coding environment to achieve improved visualization of analysis results and interpretation
The course is divided into 4 major components:
- Analysis of RNA seq data: Preparing the Gene Expression Matrix
- Principal Component Analysis and how to execute it on T-BioInfo
- Expanding PCA Visualization using R
Objective: We will begin with the context for principal component analysis (PCA) by discussing its application for complex data visualization by reducing dimensionality. We will use the T-BioInfo platform for processing raw RNA-seq data and then run PCA in the data mining section of the platform. We will then look at the output plot and learn how to use the available R script to run it on your own computer and understand the code. We will then learn about expanding the basic functions of this script and gather further insight into samples and gene expression patterns in this dataset.
In this course, you will learn about the T-BioInfo platform and the R Studio IDE (Integrated Development Environment). After testing the initial script on a sample dataset with standard configurations, we will explore several analyses and visualization packages to make visual improvements. This exercise will be useful for those seeking to improve their skills in data analysis and visualization. We will walk through all the steps necessary to make and improve PCA scatter plots and understand the analysis results.
Prerequisites: For those not familiar with RNA-seq data, gene expression and what type of information it can offer, we recommend completing the Transcriptomics 1 course. This online course will provide a detailed explanation of RNA-seq data and basic analysis steps. In our example, we will use public-domain data. You are encouraged to explore the dataset and read the associated publication – both are described in detail in the project Modeling Precision Medicine.
- Lectures 10
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English
- Students 203
- Certificate Yes
- Assessments Yes
Preparing Data for Analysis
Principal Component Analysis and Visualization
Expanding PCA Visualization using R