Home›Guides›Data Science›Machine Learning Overview — Supervised, Unsupervised & Key Concepts

Know if you're actually ready. Take the Data Science quiz → get your AI readiness report.

📊 Data Science

Machine Learning Overview: Supervised, Unsupervised, and Key Concepts

Machine learning fundamentals are tested on every data science exam. Here's the core concepts and where beginners get confused.

Examifyr·2026·6 min read

Supervised vs unsupervised learning

The fundamental distinction in machine learning is whether you have labelled training data.

# Supervised Learning: labelled training data (X → y)

# Classification: predict a category
# - Logistic Regression, Decision Trees, Random Forest, SVM, Neural Networks
# - Examples: spam detection, disease diagnosis, image classification

# Regression: predict a continuous value
# - Linear Regression, Ridge/Lasso, Random Forest, XGBoost
# - Examples: house price prediction, sales forecasting

# Unsupervised Learning: no labels, find structure in data

# Clustering: group similar data points
# - K-Means, DBSCAN, Hierarchical
# - Examples: customer segmentation, anomaly detection

# Dimensionality Reduction: reduce features
# - PCA, t-SNE, UMAP
# - Examples: visualisation, preprocessing

# Semi-supervised: small labelled + large unlabelled data

Train, validation, and test sets

Splitting data correctly is critical for honest model evaluation.

from sklearn.model_selection import train_test_split

X, y = features, labels

# Train/test split (80/20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train/validation/test (60/20/20)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5)

# Rules:
# Train: model learns from this
# Validation: tune hyperparameters, select model
# Test: final, unbiased evaluation (touch ONCE at the end)

Note: Never use test data for model selection or hyperparameter tuning — it leads to data leakage and overly optimistic results.

Overfitting, underfitting, and bias-variance tradeoff

These concepts explain why ML models fail to generalise to new data.

# Overfitting:
# - Model too complex, memorises training data
# - Low training error, high test error
# - Symptoms: perfect training accuracy, poor test accuracy
# - Fix: more data, regularisation, simpler model, dropout

# Underfitting:
# - Model too simple, can't capture patterns
# - High training error AND high test error
# - Fix: more features, more complex model, less regularisation

# Bias-Variance Tradeoff:
# Bias:     error from wrong assumptions (underfitting)
# Variance: sensitivity to training data fluctuations (overfitting)

# High bias  → underfitting (e.g., linear model for non-linear data)
# High variance → overfitting (e.g., deep decision tree)

# Goal: find the sweet spot
# - Cross-validation helps estimate generalisation error

Common algorithms overview

Know when to use which algorithm — a common exam and interview topic.

# Linear Regression
# - Continuous output, linear relationship assumption
# - Fast, interpretable, good baseline

# Logistic Regression
# - Binary/multi-class classification
# - Outputs probabilities, interpretable

# Decision Tree
# - Classification and regression
# - Interpretable, but prone to overfitting

# Random Forest
# - Ensemble of decision trees (bagging)
# - Reduces variance, handles missing values
# - Less interpretable than single tree

# Gradient Boosting (XGBoost, LightGBM)
# - Builds trees sequentially, each corrects previous errors
# - Often best for tabular data
# - Can overfit if not tuned

# K-Nearest Neighbours (KNN)
# - Non-parametric, lazy learning
# - Slow at prediction, sensitive to scale

Exam tip

Overfitting vs underfitting is the most common ML exam question. Remember: overfitting = low train error, high test error (too complex); underfitting = high both (too simple). The test set should NEVER be used for model selection.

🎯

Think you're ready? Prove it.

Take the free Data Science readiness test. Get a score, topic breakdown, and your exact weak areas.

Take the free Data Science test →

Free · No sign-up · Instant results

Struggling with Data Science?

Work 1:1 with a vetted Data Science tutor on Wyzant. (affiliate link)

Find a Data Science tutor →

← Previous

Statistics for Data Science — Mean, Distributions, Hypothesis Testing

ML Model Evaluation — Accuracy, Precision, Recall, F1 & ROC-AUC

← All Data Science guides