# Stepwise Regression

Sometimes in multiple regression models suffer from problems like **multicollinearity** and **increased the complexity** of data collection and model maintenance, due to a large number of variables. In this article, we will learn how we can use stepwise regression to overcome these challenges.

So our objective is to build regression models which are complete and as realistic as possible. We want every variable which is even remotely related to the dependent variable to be included. Secondly, we want to include as few variables as possible.

Theory and experience give us a certain direction as to which variables should be included in the regression model. Moreover manually filtering through and comparing regression models can be tedious.

Luckily, several approaches exist for automatically performing feature selection or variable selection — that is, for identifying those variables that result in superior regression results. These **traditional approaches the determining the subset of actual predictor variables** is called the **variable selection**.

The** three **main approaches of variable selection are as follows:

**Forward Selection**

The forward selection method begins with no candidate variables in the model. Then at each step, we select the variable with the

Forward selection is mostly used when a large group of variables exists. For example, suppose there are more than fifty variables in a data set. A reasonable approach would be to obtain the best “n” number of variables and then apply the all-possible algorithm in the subset. This procedure is also a good choice when multicollinearity is a problem.

**Backward Selection**

The backward selection method or one might say backward elimination method begins with all possible variables that are believed to be potentially significant. Then at each step, we attempt to eliminate the variable that is most insignificant. This process continues until no insignificant variables remain and no further variables can be deleted without a statistically significant loss of fit.

**Stepwise Selection**

Stepwise regression is a combination of forward and backward selection. The method of variable selection starts with no predictor variables and then sequentially adds new variables into the model which leads to a reduction of the sum squared errors. Then simultaneously the old predictors are removed from the model at later stages that have become insignificant as a result of the inclusion of additional variables in the model.

The process carries on until an equilibrium point is reached where no significant reduction in the sum squared residuals is to be gained by adding variables in the regression and where a significant increase in the sum squared residuals would arise if a variable were removed from regression.

**Implementation Of Stepwise Regression In R**

There are various packages to do the stepwise regression. Here I have shown** two** methods. Loading the necessary packages

library (

library (caret)

library (leaps)

library (MASS)

**1 ^{ST} METHOD**

The first method uses the **stepAIC ()** function present in the **MASS** package. It chooses the best model by AIC **Wiki**. We use the option **direction** **= “both” **for stepwise regression. We can also perform forward and backward selection by choosing **“forward”** and **“backward”** respectively. Fitting the model with all the predictor variables and saving it in

**model_lm.** ** ****model_lm <- lm (medv ~.,data = train)**

This is the multiple linear regression models. And the summary given below shows that **model_lm **contains all the predictor variables in the data set.

Checking the summary of the model

**summary (model_lm)**

Fitting the stepwise regression model and saving it** **

**in model_stepAIC** **model_stepAIC <- stepAIC (model_lm, direction = “both”, trace = FALSE)**

From the summary of the stepwise regression model, it is clear that only the variables which are highly significant are considered in the updated model.

**summary (model_stepAIC)**

Now we predict the value of **medv** or House Price and save it in

**Pred_model_stepAIC** **Pred_model_stepAIC <- predict (model_stepAIC, test [,-14])**

**mape_stepAIC**

In the second method, I have used the **caret** package and we the option to use **method = “leapSeq” **to carry out the stepwise regression. The other methods to fit forward and backward selection are given below:

“

“

We use **10-fold
cross-validation** to estimate the RMSE, MAE and etc. These statistical error
estimates are used to compare the models and to automatically choose the one of
them

We also specify the tuning parameter** nvmax**, which is the maximum number of predictors that can be present in the model. Here

**varies from 1 to 10. And the function searches the best number of predictor variables to be finally incorporated into the model.**nvmax

**set.seed (123)****train.control <- trainControl (method = “cv”, number = 10)**

Saving the model in** model_caret**

**model_caret <- train (medv ~., data = train, method = “leapSeq”, tuneGrid = data.frame (nvmax = 1:10), trControl = train.control)****summary (model_caret)**

**model_caret$results**

**plot (model_caret)**

**model_caret$bestTune**

So the best number
of predictor variables that minimizes the RMSE is** 4**
>Now, predicting the values of House Price with the caret model and saving it in

**Pred_model_caret** **Pred_model_caret <- predict (model_caret, test [,-14])**

Calculating and
saving the value of mape in **mape_caret**

**mape_caret <- mape (test$medv, Pred_model_caret)****mape_caret**

**ADVANTAGES
AND DISADVANTAGES**

**Advantages
of Stepwise Regression:**

- It is faster than most other automatic model-selection methods.
- It has the ability to manage the predictor variables in the regression model by eliminating or inserting them into the model according to their significance.

**Disadvantages
of Stepwise Regression**

- Smaller datasets may result in higher sum squared residuals.
- The method adds or removes variables in a certain order; that we end up with a combination of predictors. That combination of predictors may not be closest to how it is in reality.

To know more about the other methods see the links given below: