What Is Semi-Supervised Learning

Author: Jason Brownlee

Semi-supervised learning is a learning problem that involves a small number of labeled examples and a large number of unlabeled examples.

Learning problems of this type are challenging as neither supervised nor unsupervised learning algorithms are able to make effective use of the mixtures of labeled and untellable data. As such, specialized semis-supervised learning algorithms are required.

In this tutorial, you will discover a gentle introduction to the field of semi-supervised learning for machine learning.

After completing this tutorial, you will know:

  • Semi-supervised learning is a type of machine learning that sits between supervised and unsupervised learning.
  • Top books on semi-supervised learning designed to get you up to speed in the field.
  • Additional resources on semi-supervised learning, such as review papers and APIs.

Let’s get started.

What Is Semi-Supervised Learning

What Is Semi-Supervised Learning
Photo by Paul VanDerWerf, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Semi-Supervised Learning
  2. Books on Semi-Supervised Learning
  3. Additional Resources

Semi-Supervised Learning

Semi-supervised learning is a type of machine learning.

It refers to a learning problem (and algorithms designed for the learning problem) that involves a small portion of labeled examples and a large number of unlabeled examples from which a model must learn and make predictions on new examples.

… dealing with the situation where relatively few labeled training points are available, but a large number of unlabeled points are given, it is directly relevant to a multitude of practical problems where it is relatively expensive to produce labeled data …

— Page xiii, Semi-Supervised Learning, 2006.

As such, it is a learning problem that sits between supervised learning and unsupervised learning.

Semi-supervised learning (SSL) is halfway between supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some super- vision information – but not necessarily for all examples. Often, this information will be the targets associated with some of the examples.

— Page 2, Semi-Supervised Learning, 2006.

We require semi-supervised learning algorithms when working with data where labeling examples is challenging or expensive.

Semi-supervised learning has tremendous practical value. In many tasks, there is a paucity of labeled data. The labels y may be difficult to obtain because they require human annotators, special devices, or expensive and slow experiments.

— Page 9, Introduction to Semi-Supervised Learning, 2009.

The sign of an effective semi-supervised learning algorithm is that it can achieve better performance than a supervised learning algorithm fit only on the labeled training examples.

Semi-supervised learning algorithms generally are able to clear this low bar expectation.

… in comparison with a supervised algorithm that uses only labeled data, can one hope to have a more accurate prediction by taking into account the unlabeled points? […] in principle the answer is ‘yes.’”

— Page 4, Semi-Supervised Learning, 2006.

Finally, semi-supervised learning may be used or may contrast inductive and transductive learning.

Generally, inductive learning refers to a learning algorithm that learns from labeled training data and generalizes to new data, such as a test dataset. Transductive learning refers to learning from labeled training data and generalizing to available unlabeled (training) data. Both types of learning tasks may be performed by a semi-supervised learning algorithm.

… there are two distinct goals. One is to predict the labels on future test data. The other goal is to predict the labels on the unlabeled instances in the training sample. We call the former inductive semi-supervised learning, and the latter transductive learning.

— Page 12, Introduction to Semi-Supervised Learning, 2009.

If you are new to the idea of transduction vs. induction, the following tutorial has more information:

Now that we are familiar with semi-supervised learning from a high-level, let’s take a look at top books on the topic.

Books on Semi-Supervised Learning

Semi-supervised learning is a new and fast-moving field of study, and as such, there are very few books on the topic.

There are perhaps two key books on semi-supervised learning that you should consider if you are new to the topic; they are:

Let’s take a closer look at each in turn.

Semi-Supervised Learning, 2006

The book “Semi-Supervised Learning” was published in 2006 and was edited by Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien.

Semi-Supervised Learning

Semi-Supervised Learning

This book provides a large number of chapters, each written by top researchers in the field.

It is designed to take you on a tour of the field of research including intuitions, top techniques, and open problems.

The full table of contents is listed below.

Table of Contents

  • Chapter 01: Introduction to Semi-Supervised Learning
  • Part I: Generative Models
    • Chapter 02: A Taxonomy for Semi-Supervised Learning Methods
    • Chapter 03: Semi-Supervised Text Classification Using EM
    • Chapter 04: Risks of Semi-Supervised Learning
    • Chapter 05: Probabilistic Semi-Supervised Clustering with Constraints
  • Part II: Low-Density Separation
    • Chapter 06: Transductive Support Vector Machines
    • Chapter 07: Semi-Supervised Learning Using Semi-Definite Programming
    • Chapter 08: Gaussian Processes and the Null-Category Noise Model
    • Chapter 09: Entropy Regularization
    • Chapter 10: Data-Dependent Regularization
  • Part III: Graph-Based Methods
    • Chapter 11: Label Propagation and Quadratic Criterion
    • Chapter 12: The Geometric Basis of Semi-Supervised Learning
    • Chapter 13: Discrete Regularization
    • Chapter 14: Semi-Supervised Learning with Conditional Harmonic Mixing
  • Part IV: Change of Representation
    • Chapter 15: Graph Kernels by Spectral Transforms
    • Chapter 16: Spectral Methods for Dimensionality Reduction
    • Chapter 17: Modifying Distances
  • Part V: Semi-Supervised Learning in Practice
    • Chapter 18: Large-Scale Algorithms
    • Chapter 19: Semi-Supervised Protein Classification Using Cluster Kernels
    • Chapter 20: Prediction of Protein Function from Networks
    • Chapter 21: Analysis of Benchmarks
  • Part VI: Perspectives
    • Chapter 22: An Augmented PAC Model for Semi-Supervised Learning
    • Chapter 23: Metric-Based Approaches for Semi-Supervised Regression and Classification
    • Chapter 24: Transductive Inference and Semi-Supervised Learning
    • Chapter 25: A Discussion of Semi-Supervised Learning and Transduction

I highly recommend this book and reading it cover to cover if you are starting out in this field.

Introduction to Semi-Supervised Learning, 2009

The book “Introduction to Semi-Supervised Learning” was published in 2009 and was written by Xiaojin Zhu and Andrew Goldberg.

Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning

This book is aimed at students, researchers, and engineers just getting started in the field.

The book is a beginner’s guide to semi-supervised learning. It is aimed at advanced under-graduates, entry-level graduate students and researchers in areas as diverse as Computer Science, Electrical Engineering, Statistics, and Psychology.

— Page xiii, Introduction to Semi-Supervised Learning, 2009.

It’s a shorter read than the above book and a great introduction.

The full table of contents is listed below.

Table of Contents

  • Chapter 01: Introduction to Statistical Machine Learning
  • Chapter 02: Overview of Semi-Supervised Learning
  • Chapter 03: Mixture Models and EM
  • Chapter 04: Co-Training
  • Chapter 05: Graph-Based Semi-Supervised Learning
  • Chapter 06: Semi-Supervised Support Vector Machines
  • Chapter 07: Human Semi-Supervised Learning
  • Chapter 08: Theory and Outlook

I also recommend this book if you’re just starting out for a quick review of the key elements of the field.

Other Books

There are some additional books on semi-supervised learning that you might also like to consider; they are:

Have you read any of the above books?
What did you think?

Did I miss your favorite book?
Let me know in the comments below.

Additional Resources

There are additional resources that may be helpful when getting started in the field of semi-supervised learning.

I would recommend reading some review papers.

Some examples of good review papers on semi-supervised learning include:

In this paper, we provide a comprehensive overview of deep semi-supervised learning, starting with an introduction to the field, followed by a summarization of the dominant semi-supervised approaches in deep learning.

An Overview of Deep Semi-Supervised Learning, 2020.

An Overview of Deep Semi-Supervised Learning

An Overview of Deep Semi-Supervised Learning

It is also a good idea to try out some of the algorithms.

The scikit-learn Python machine learning library provides a few graph-based semi-supervised learning algorithms that you can try:

The Wikipedia article may also provide some useful links for further reading:

Summary

In this tutorial, you discovered a gentle introduction to the field of semi-supervised learning for machine learning.

Specifically, you learned:

  • Semi-supervised learning is a type of machine learning that sits between supervised and unsupervised learning.
  • Top books on semi-supervised learning designed to get you up to speed in the field.
  • Additional resources on semi-supervised learning, such as review papers and APIs.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

The post What Is Semi-Supervised Learning appeared first on Machine Learning Mastery.

Go to Source