Complete Machine Learning Workflow
Machine Learning Workflow
Prerequisites
BioNT Applied Machine Learning for Biological Data
Module 1: Python Numpy and Pandas
Module 2: Classification: Logistic regression; Tree-based methods; Matrices for classification evaluation
Participants should gain skills introduced in above mentioned Lessons or equivalent skills.
Time
2 hours
Objectives
Demonstrate the use of complete classification workflow for cancer dataset (expand on previous hands-on session)
Example workflow of Logistic regression with Glioma test dataset for Glioma sub-type classification
ML use-case
ML use-case as described in Classification hands-on session
Example Logistic regression workflow tries to use most frequently mutated 20 genes and 3 clinical features to classify/ grade gliomas
Demonstrate following key techniques
Data exploration and handing missing data
Scaling
Cross-validation
Hyper-parameter tuning with GridSearch
Dataset
Dataset as described in Classification hands-on session
Features:
Most frequently mutated 20 genes and
3 clinical features: gender, age at diagnosis, race
Target variable (i.e, dependant variable or response variables): Glioma grade class information
0 = “LGG”
1 = “GBM”
Several Additional columns and rows with null values are spiked into the original dataset for demonstration purpose