Data Wrangling & Machine Learning for Interdisciplinary Users

This tutorial consists of the courses about Data Mining Techniques, Machine Learning disciplines, Artificial Intelligence capabilities for the users from:

  • Education

  • Health and Medicine

  • Language

  • Law

  • Economics

  • Maths and Natural Science

  • Social Sciences & Psychology

Data Wrangling

Data Understanding and Knowledge Discovery Process

Basic concepts, definitions, data types and formats.

What should we teach:

  • Introduction to Programming: R programming, Python.

  • Introduction to Version Control Systems: Github

  • Introduction to Productivity Tools: Installing R, R-Studio, Git.

  • The data, problems, and tools that data analysts use

  • Common data storage systems

Data Collection (Data Gathering)

Gather data from various sources, such as databases, spreadsheets, text files, web scraping, or APIs.

Data Inspection and Cleaning

Examine the raw data to understand its structure, size, and quality. Check for missing values, duplicates, outliers, and any anomalies that may affect the analysis. Address data quality issues by:

  • Handling missing values: Decide whether to remove, impute, or use default values for missing data.

  • Removing duplicates: Identify and remove duplicate records if they exist.

  • Correcting errors: Fix data entry errors, inconsistencies, and outliers.

  • Standardizing data: Ensure consistency in data formats (e.g., date formats, units of measurement).

  • Handling outliers: Decide how to deal with outliers based on domain knowledge and analysis goals.

Data Transformation

Transform the data to make it suitable for analysis:

  • Data encoding: Convert categorical data into numerical format (e.g., one-hot encoding).

  • Scaling and normalization: Standardize numerical variables to have similar scales.

  • Feature engineering: Create new features or variables that provide more information for analysis.

  • Aggregation: Summarize data at different levels (e.g., group by category, time period).

  • Pivot and reshape data: Rearrange data to facilitate analysis (e.g., pivot tables).

  • Date and time manipulation: Extract relevant information from date and time columns.

  • Data Reduction: Reduce the dimensionality of the data by selecting relevant variables or features for analysis. Dimensionality reduction techniques like Principal Component Analysis (PCA) can be employed.

Data Analysis

Aim: Make sense of your data to extract meaningful insights.

  • Exploratory Data Analysis (How to explore data relationships)

  • Descriptive Analysis (What happened)

  • Diagnostic Analysis (Why it happened)

  • Predictive Analysis (What will happen)

  • Prescriptive Analysis (How will it happen)

Machine Learning

By the end of this Specialization, you will be ready to:

  • Build machine learning models in Python using popular machine learning libraries NumPy and scikit-learn.

  • Build and train supervised machine learning models for prediction and binary classification tasks, including linear regression and logistic regression.

  • Build and train a neural network with TensorFlow to perform multi-class classification.

  • Apply best practices for machine learning development so that your models generalize to data and tasks in the real world.

  • Build and use decision trees and tree ensemble methods, including random forests and boosted trees.

  • Use unsupervised learning techniques for unsupervised learning: including clustering and anomaly detection.

  • Build recommender systems with a collaborative filtering approach and a content-based deep learning method.

  • Build a deep reinforcement learning model.

Supervised Machine Learning: Regression and Classification

What you’ll learn:

  • Build machine learning models in Python using popular machine learning libraries NumPy & scikit-learn

  • Build & train supervised machine learning models for prediction & binary classification tasks, including linear regression & logistic regression

    • Introduction to Machine Learning

    • Classification

    • Regression

Unsupervised Learning, Recommenders, Reinforcement Learning

What you’ll learn:

  • Use unsupervised learning techniques for unsupervised learning: including clustering and anomaly detection

  • Build recommender systems with a collaborative filtering approach and a content-based deep learning method

  • Build a deep reinforcement learning model

    • Unsupervised Learning

    • Recommender systems

    • Deep reinforcement learning model

Advanced Learning Algorithms

What you’ll learn:

  • Build and train a neural network with TensorFlow to perform multi-class classification

  • Apply best practices for machine learning development so that your models generalize to data and tasks in the real world

  • Build and use decision trees and tree ensemble methods, including random forests and boosted trees

    • Neural networks

    • Decision trees

    • Deep learning