Unsupervised Learning

Principal component analysis (PCA) and K-means clustering

Prerequisites

BioNT Applied Machine Learning for Biological Data
- Module 1: Python Numpy and Pandas

Participants should gain skills introduced in above mentioned Lessons or equivalent skills.

Time

2 hours and 30 minutes

Objectives

Objectives

Demonstrate the use of unsupervised learning for drug sensitivity analysis.
Example workflow of PCA and K-means clustering with test dataset (drug sensitivity patterns across patients) for patient stratification

Note

ML use-case

Drug sensitivity scores: 50 drugs and 25 patients
Unsupervised learning (PCA and clustering) analysis will
1. Transform the drug sensitivity data (high-dimensional) into a dataset (lower-dimensional) that capture the most significant variance and patterns
2. Group patients into distinct strata based on similarities in their overall drug sensitivity patterns

Dataset

Imputed Drug Sensitivities:
- This data was imputed for TCGA-BRCA patients based on a model trained on cancer cell line gene expression and corresponding in vitro drug response measurements
Source: Cancer drug sensitivity prediction from routine histology images

download test dataset

Notebook

Download the notebook

alt text