4 min readNov 14, 2020

Mutual Information — ML Feature Selection

What is mutual information?

Information gain calculates the predictive power of a feature on an outcome(target).

https://en.wikipedia.org/wiki/Mutual_information

Mutual Information of X, Y is the sum of probability of X and probability of Y

Used:

It is commonly used in the construction of decision trees from a training dataset, by evaluating the information gain for each variable, and selecting the variable that maximizes the information gain, which in turn minimizes the entropy and best splits the dataset into groups for effective classification.

Information gain can also be used for feature selection, by evaluating the gain of each variable in the context of the target variable. In this slightly different usage, the calculation is referred to as mutual information between the two random variables.

Information gain is the reduction in entropy or surprise by transforming a dataset and is often used in training decision trees.

Information gain is calculated by comparing the entropy of the dataset before and after a transformation.

Mutual information calculates the statistical dependence between two variables and is the name given to information gain when applied to variable selection

Difference between Correlation and Mutual Information

Correlation analysis provides a quantitative means of measuring the strength of a linear relationship between two vectors of data.

Mutual information is essentially the measure of how much “knowledge” one can gain of a certain variable by knowing the value of another variable

Python Implementation:

from sklearn.feature_selection import mutual_info_classif, mutual_info_regression
from sklearn.feature_selection import SelectKBest, SelectPercentile