Anakin
4 min readNov 14, 2020

Mutual Information — ML Feature Selection

What is mutual information?

Information gain calculates the predictive power of a feature on an outcome(target).

https://en.wikipedia.org/wiki/Mutual_information

Mutual Information of X, Y is the sum of probability of X and probability of Y

Used:

It is commonly used in the construction of decision trees from a training dataset, by evaluating the information gain for each variable, and selecting the variable that maximizes the information gain, which in turn minimizes the entropy and best splits the dataset into groups for effective classification.

Information gain can also be used for feature selection, by evaluating the gain of each variable in the context of the target variable. In this slightly different usage, the calculation is referred to as mutual information between the two random variables.

Information gain is the reduction in entropy or surprise by transforming a dataset and is often used in training decision trees.

Information gain is calculated by comparing the entropy of the dataset before and after a transformation.

Mutual information calculates the statistical dependence between two variables and is the name given to information gain when applied to variable selection

Difference between Correlation and Mutual Information

Correlation analysis provides a quantitative means of measuring the strength of a linear relationship between two vectors of data.

Mutual information is essentially the measure of how much “knowledge” one can gain of a certain variable by knowing the value of another variable

Python Implementation:

from sklearn.feature_selection import mutual_info_classif, mutual_info_regression

from sklearn.feature_selection import SelectKBest, SelectPercentile

Step1 — Get the Ranking of the Features

1- For classification implement the mutual_info_classif

mi is the array returned with mutual information

2. convert the array to a DataFrame

3.You can plot the highest Mutual information features… the higher the value the more better/ predictive the feature

Top 10 MI are

Step 2 — Get the Top Selected Features

Top 10 by SelectKBest are very similar

Transform the data back with SelectKBest and remove the unselected features

Regression Data

Only apply on numerical features , create a dataset with only numeric values

Regression Implementation is exactly the same we just use:

Step 1 — Get Highest Ranking Features

mutual_info_regression

You can plot the highest information features… the higher the value the more better/ predictive the feature

Step 2 — Select between the highest Ranking Features , for regression we use SelectPercentile

sel_=SelectPercentile(mutual_info_regression, percentile=10).fit(X_train,y_train)

Transform (not fit) back the model to the X_train and X_test to remove all the unnecessary features

No responses yet