Logistic regression is one of the regression techniques often used in classification and prediction. In this article, we will discuss the basics of logistic regression. Additionally, we will implement logistic regression using the sklearn module in Python.
What is Logistic Regression?
Logistic regression is one of the machine learning techniques used to estimate the probability of an event occurring. The event can be a value or set of values belonging to a certain class or evaluating to a certain value.
Logistic regression is employed in machine learning applications for classification and prediction tasks. In logistic regression, we train a logistic function that indicates if a set of independent variables evaluates to one value of the dependent variable or another. Here, the number of values that the dependent variable takes is discrete and finite. Therefore, logistic regression works as a classification algorithm.
We use logistic regression in creating machine learning applications that are used for classification tasks such as cancer detection, checking if a person is obese or not, etc.
Types of Logistic Regression
Depending on the number of values the independent variable can take, we can divide logistic regression into the following types.
Binomial Logistic Regression: In binomial logistic regression, the dependent variable can take only two values.
Multinomial Logistic Regression: In multinomial logistic regression, the dependent variable can take three or more values. The values are not ordered.
Ordinal Logistic Regression: In ordinal logistic regression, the dependent variable can take three or more values. However, the values have an order in this case.
Logistic Regression vs Linear Regression
This machine learning technique is similar to linear regression with few differences. In logistic regression, the dependent variable can take only a finite number of values. Generally, the dependent variable in logistic regression consists of only two values i.e. 0 and 1, True and False, etc.
Another difference between Linear regression and Logistic regression is that we derive the equation of a line in linear regression. However, we train a sigmoid function or logistic function in logistic regression.
Assumptions of Logistic Regression
Logistic regression assumes that the dependent variable is categorical in nature. Therefore, if the dataset contains continuous values, you need to define ranges and change the dependent variables to categorical values while data cleaning.
Additionally, logistic regression assumes that there is no multi-collinearity in the dataset. It means that the independent variables must not be highly correlated with each other. If there are highly correlated variables in the dataset, you need to drop some of the attributes from the dataset while data processing to get better results.
Now that we have discussed the basics of logistic regression, let us now discuss the implementation of logistic regression using the sklearn module in Python.
Logistic regression using sklearn in Python
To implement logistic regression using sklearn in Python, consider the following dataset.
Here, we have been given a dataset with the height and weight of people as independent variables. The dependent variable Obesity can take two values. If the person is obese, the value for Obesity is 1. Otherwise, the value for Obesity is 0.
To implement logistic regression using the sklearn module in Python, we will use three functions from the sklearn module.
LogisticRegression()function is used to create an empty logistic regression model. The
LogisticRegression()function, after execution returns a
LogisticRegressionobject. The function uses a seed number for randomization. Therefore, we can pass the value 0 to the
- To train the logistic regression model, we will use the
fit()method, when invoked on a
LogisticRegressionobject, takes a list of vectors containing the independent variables and a list containing dependent variables as its second input argument. After execution, it returns a trained
- To predict values using the independent variables, we will use the
predict()method, when invoked on a trained logistic regression model, takes a list of vectors containing the independent variables. After execution, it returns the output values.
Logistic Regression Implementation in Python
Before proceeding with the implementation of the logistic regression model, we need to prepare the dataset that we can feed into the
- If we have N independent variables
X1, X2, X3, X4, X5 to XN,we will create a list of tuples where each tuple contains N elements. The tuple at position i in the dataset should contain the values
X1i, X2i, X3i, X4i, X5i, ....., XNi. Thus, the tuple at position i in the list of tuples will represent the
ithentry in the dataset.
- To create the list of tuples from the lists of each attribute, we will use the
zip()method will take each list
X1, X2, X3, X4, X5 till XNas its input argument. After execution, it will return the list of tuples.
- Once we get the list of tuples containing independent variables, we will use the
LogisticRegression()function to create an empty logistic regression model.
- After that, we will train the empty model using the
fit()method, when invoked on a LogisticRegression object, takes the list of tuples containing independent variables as its first input argument and the list of dependent variables as its second input argument. After execution, the
fit()method returns a trained logistic regression model.
- Once we get the trained logistic regression model, we will use the
predict()method to predict the value of the dependent variable for a set of independent variables. The
predict()method takes a list of independent variables as input arguments and returns a list of predicted values for dependent variables.
You can implement the logistic regression algorithm using the sklearn module and predict values as shown in the following example.
from sklearn.linear_model import LogisticRegression
print("The input values are:",input_values)
print("The output values are:",output_values)
The input values are: [(180, 90), (180, 72)]
The output values are: [1 0]
In the above code, we have first created a list of
obesity values from the given dataset. Then, we have zipped the lists to create a list of tuples
X that has to be given as training data to the
fit() method. The list
X contains independent variables and the list
obesity contains dependent variables. After that, we have followed the approached discussed before implementing the code.
Here, you can observe that the trained machine learning model gives a value 1 for obesity when we pass 180 as height and 90 for weight notifying that the given person is obese. On the other hand, when we give 72 as the weight for the same height we get the output that the person is not obese.
In this article, we have discussed the basics of logistic regression with the algorithm, examples and its types. We have also discussed the implementation of logistic regression using the sklearn module in Python.