Information gain for dummies

What is Information Gain?

Information gain is a measure of how much information we gain about a dataset by splitting it into subsets based on a specific attribute.

In other words, it helps us understand how useful a particular feature is in predicting the outcome of a dataset.

Information gain is most commonly used in decision trees, a popular machine learning algorithm that creates a tree-like structure of decisions based on the input data.

Entropy: The Key to Understanding Information Gain

To fully comprehend information gain, it’s crucial to understand the concept of entropy.

Entropy is a measure of the uncertainty or randomness in a dataset. In machine learning, our goal is often to reduce this uncertainty by finding patterns in the data that help us make accurate predictions.

Calculating Information Gain

Information gain is calculated by comparing the entropy of the dataset before and after splitting it based on a specific attribute. Here’s a step-by-step guide:

  1. Calculate the entropy of the entire dataset (Initial Entropy).
  2. Split the dataset based on a specific attribute.
  3. Calculate the entropy of each subset created in step 2.
  4. Calculate the weighted average of the entropies from step 3 (Weighted Entropy).
  5. Subtract the Weighted Entropy from the Initial Entropy to find the Information Gain.

Information Gain = Initial Entropy – Weighted Entropy

The higher the information gain, the better the attribute is at reducing the uncertainty in the dataset.

A Simple Example

Imagine we have a dataset of 30 fruits, with 15 apples and 15 oranges. We want to predict if a fruit is an apple or an orange based on two attributes: color (red or orange) and shape (round or elongated).

  1. Calculate the Initial Entropy: In our dataset, there’s a 50% chance of a fruit being an apple and a 50% chance of it being an orange. The initial entropy is therefore 1 (maximum uncertainty).
  2. Split the dataset based on color: We now have two subsets:
    • Red fruits: 12 apples and 3 oranges.
    • Orange fruits: 3 apples and 12 oranges.
  3. Calculate the entropy of each subset:
    • Red fruits entropy: 0.92
    • Orange fruits entropy: 0.92
  4. Calculate the Weighted Entropy: The weighted entropy is the average of the subset entropies, considering the proportion of elements in each subset. In this case, the weighted entropy is 0.92.
  5. Calculate the Information Gain: Information Gain = 1 (Initial Entropy) – 0.92 (Weighted Entropy) = 0.08.

So, splitting the dataset based on color results in an information gain of 0.08. We would then repeat this process for the shape attribute and compare the information gains.

The attribute with the highest information gain would be chosen as the first decision node in our decision tree.


Understanding information gain is essential in the world of machine learning, especially when working with decision trees.

By breaking down the concept into smaller, more manageable parts, we hope this article has made information gain more accessible and easier to grasp.

Just remember that information gain is all about reducing uncertainty in the dataset, and you’ll be well on your way to mastering this essential concept.

Leave a Comment