Logistic Regression Using sklearn Module in Python
Logistic regression is one of the regression techniques often used in classification and prediction. In this article, we will discuss the basics of logistic regression. Additionally, we will implement logistic regression using the sklearn module in Python.
What is Logistic Regression?
Logistic regression is one of the machine learning techniques used to estimate the probability of an event occurring. The event can be a value or set of values belonging to a certain class or evaluating to a certain value.
Logistic regression is employed in machine learning applications for classification and prediction tasks. In logistic regression, we train a logistic function that indicates if a set of independent variables evaluates to one value of the dependent variable or another. Here, the number of values that the dependent variable takes is discrete and finite. Therefore, logistic regression works as a classification algorithm.
We use logistic regression in creating machine learning applications that are used for classification tasks such as cancer detection, checking if a person is obese or not, etc.
Types of Logistic Regression
Depending on the number of values the independent variable can take, we can divide logistic regression into the following types.
Binomial Logistic Regression: In binomial logistic regression, the dependent variable can take only two values.
Multinomial Logistic Regression: In multinomial logistic regression, the dependent variable can take three or more values. The values are not ordered.
Ordinal Logistic Regression: In ordinal logistic regression, the dependent variable can take three or more values. However, the values have an order in this case.
Logistic Regression vs Linear Regression
This machine learning technique is similar to linear regression with few differences. In logistic regression, the dependent variable can take only a finite number of values. Generally, the dependent variable in logistic regression consists of only two values i.e. 0 and 1, True and False, etc.
Another difference between Linear regression and Logistic regression is that we derive the equation of a line in linear regression. However, we train a sigmoid function or logistic function in logistic regression.
Assumptions of Logistic Regression
Logistic regression assumes that the dependent variable is categorical in nature. Therefore, if the dataset contains continuous values, you need to define ranges and change the dependent variables to categorical values while data cleaning.
Additionally, logistic regression assumes that there is no multi-collinearity in the dataset. It means that the independent variables must not be highly correlated with each other. If there are highly correlated variables in the dataset, you need to drop some of the attributes from the dataset while data processing to get better results.
Now that we have discussed the basics of logistic regression, let us now discuss the implementation of logistic regression using the sklearn module in Python.
Logistic regression using sklearn in Python
To implement logistic regression using sklearn in Python, consider the following dataset.
Height | Weight | Obesity |
120 | 40 | 0 |
110 | 40 | 1 |
130 | 55 | 1 |
180 | 72 | 0 |
175 | 68 | 0 |
150 | 68 | 1 |
150 | 64 | 0 |
140 | 60 | 1 |
140 | 58 | 0 |
Here, we have been given a dataset with the height and weight of people as independent variables. The dependent variable Obesity can take two values. If the person is obese, the value for Obesity is 1. Otherwise, the value for Obesity is 0.
To implement logistic regression using the sklearn module in Python, we will use three functions from the sklearn module.
- The
LogisticRegression()
function is used to create an empty logistic regression model. TheLogisticRegression()
function, after execution returns aLogisticRegression
object. The function uses a seed number for randomization. Therefore, we can pass the value 0 to therandom_state
parameter. - To train the logistic regression model, we will use the
fit()
method. Thefit()
method, when invoked on aLogisticRegression
object, takes a list of vectors containing the independent variables and a list containing dependent variables as its second input argument. After execution, it returns a trainedLogisticRegression
object. - To predict values using the independent variables, we will use the
predict()
method. Thepredict()
method, when invoked on a trained logistic regression model, takes a list of vectors containing the independent variables. After execution, it returns the output values.
Logistic Regression Implementation in Python
Before proceeding with the implementation of the logistic regression model, we need to prepare the dataset that we can feed into the LogisticRegression()
function.
- If we have N independent variables
X1, X2, X3, X4, X5 to XN,
we will create a list of tuples where each tuple contains N elements. The tuple at position i in the dataset should contain the valuesX1i, X2i, X3i, X4i, X5i, ....., XNi
. Thus, the tuple at position i in the list of tuples will represent theith
entry in the dataset. - To create the list of tuples from the lists of each attribute, we will use the
zip()
method. Thezip()
method will take each listX1, X2, X3, X4, X5 till XN
as its input argument. After execution, it will return the list of tuples. - Once we get the list of tuples containing independent variables, we will use the
LogisticRegression()
function to create an empty logistic regression model. - After that, we will train the empty model using the
fit()
method. Thefit()
method, when invoked on a LogisticRegression object, takes the list of tuples containing independent variables as its first input argument and the list of dependent variables as its second input argument. After execution, thefit()
method returns a trained logistic regression model. - Once we get the trained logistic regression model, we will use the
predict()
method to predict the value of the dependent variable for a set of independent variables. Thepredict()
method takes a list of independent variables as input arguments and returns a list of predicted values for dependent variables.
You can implement the logistic regression algorithm using the sklearn module and predict values as shown in the following example.
from sklearn.linear_model import LogisticRegression
heights= [120,110,130,180,175,150,150,140,140]
weights=[40,40,55,72,68,68,64,60,58]
obesity=[0,1,1,0,0,1,0,1,0]
X=list(zip(heights,weights))
regression_model=LogisticRegression()
input_values=[(180,90),(180,72)]
print("The input values are:",input_values)
output_values=regression_model.predict(input_values)
print("The output values are:",output_values)
Output:
The input values are: [(180, 90), (180, 72)]
The output values are: [1 0]
In the above code, we have first created a list of heights
, weights
, and obesity
values from the given dataset. Then, we have zipped the lists to create a list of tuples X
that has to be given as training data to the fit()
method. The list X
contains independent variables and the list obesity
contains dependent variables. After that, we have followed the approached discussed before implementing the code.
Here, you can observe that the trained machine learning model gives a value 1 for obesity when we pass 180 as height and 90 for weight notifying that the given person is obese. On the other hand, when we give 72 as the weight for the same height we get the output that the person is not obese.
Conclusion
In this article, we have discussed the basics of logistic regression with the algorithm, examples and its types. We have also discussed the implementation of logistic regression using the sklearn module in Python.
To learn more about programming, you can read this article on dynamic role based authorization using ASP.net. You can also read this article on user activity logging using Asp.net.