Logistic Regression Using R Tutorial

Why Logistic Regression?

Logistic regression is yet another technique borrowed by machine learning from the field of statistics. It’s a powerful statistical way of modelling a binomial outcome with one or more explanatory variables.

The linear Regression model assumes that the response variable Y is quantitative. But in many situations, the response variable is instead qualitative.

For example, eye colour is qualitative taking on values blue, brown or green. Often qualitative variables are referred to as categorical.

Here we study approaches for predicting qualitative response, a process that is known as classification. Predicting a qualitative response for an observation can be referred to as classifying that observation since it involves assigning the observation to a category, or class. On the other hand, often the methods used for classification first predict the probability of each of the categories of a qualitative variable, as the basis for making the classification. In this sense, they also behave like regression models.

In this blog, I have used the Titanic dataset which is available at kaggle (https://www.kaggle.com/c/titanic/data). The dataset includes 11 predictors and a response variable.

Performing logistic Regression in R

Logistic Regression in R

Logistic Regression in R

Logistic Regression in R

Interpretation:

Clearly, we can see that predictors like Parch, fare and Embarked are statistically insignificant. Since the p-value is greater than 0.05. So, dropping these predictors from the model gives the better result.


As we see Accuracy for this model is 82.41 per cent which seems to be good.
GitHub Repository for Sample Data and Code Link

You might also like More from author