The post Multivariate Adaptive Regression Splines appeared first on StepUp Analytics.

]]>**What is a non-parametric regression?**

In most popular regression techniques like generalized linear regression (GLM) and multiple linear regressions (LM) etc. the dependent variable is hypothesized to depend linearly on the predictor variables and the values of the dependent variable are predicted based on the values of the independent predictor variables. For example:

**House_Prices = Constant + 2.5*No_of_rooms + 2.0*No_of_floors +3.2*Total_Area**

Here the independent variables are the number of rooms, number of floors and the total area of the house. The dependent variable is the house price. The numerical values which are multiplied by the independent variables are the **regression coefficients**. Higher the value of the regression coefficients, higher is the influence of the independent variable over the dependent variable. If the variables are scaled properly, then direct comparisons between these coefficients are more relevant and useful.

The example given above is a parametric regression, which assumes a relationship between the dependent and independent variables prior to the regression modeling. But, **nonparametric regression** does not make such assumptions about the relationship between the variables. Instead, it constructs the relation from the coefficients and so-called **basis functions** (**Wiki Click**) that

MARSplines algorithm performs such regression techniques along with the search for **nonlinearities** in the data that helps to maximize the predictive accuracy of the model. So, MARSplines technique has taken a step forward in creating successful results where the relationship between the predictors and the dependent variables is difficult to establish.

The MARSplines model equation is given below:

In the given equation output vector y is predicted as a function of the predictor variables X. The constant term is B0 and BM is the associated coefficient. The function from a set of one or more basis functions is hm(X).

**How the MARSplines algorithm works?**

MARSplines method partitions the input data into different parts, each with its own regression equation. The partitioning of the data happens with the help of **hinge **or **rectifier functions** (**Wiki Click**) which takes the form:

Where t is a constant and it is also called a **knot**. This knot helps to model a nonlinear regression.

However, this diagram shows a single hinge function in action to attain the nonlinearity, but in reality, with a large number of

Now the model builds in two parts:

1. FORWARD PASS

2. BACKWARD PASS

**Forward Pass**

In this step we first build the model with just the intercept term. After that it starts adding basis functions in pairs. At each step it adds a pair of basis functions for each variable until it reaches to the point of minimum prediction error. The pair of basis functions are identical to each other. Each new basis function added to the model consists of a term already in the model multiplied by a new hinge function. Here we just add basis functions in a greedy way one after another, so it is also known as the **greedy algorithm**.

**Backward Pass**

Backward Pass is a method of removing those basis functions from the model which are least significant. In

Nonparametric models exhibit a high degree of flexibility that may ultimately result in overfitting of the model. This high degree of flexibility leads the model to compromise its accuracy when it is presented with a new dataset. To combat this problem, the backward pass which is also known as a “**pruning pass**” because it uses the pruning technique to limit the complexity of the model by reducing the number of its

**Implementation of MARSplines in R**

The MARSplines algorithm is available in the R package** earth **and we install it with:** **

**install.packages(“earth”)**

Now calling the package to use the function** earth **

**library (earth)**

I have used a dataset known** Boston **which is present in the** MASS package**. I have used this dataset to show you a comparison between **MARSplines **and other regression and penalized regression techniques.

**library(MASS)****data <- Boston**

Splitting the dataset in two parts

**train <- data [1:400,]****test <- data [401:506,]**

We fit the model and save the model in** Fit**

**Fit <- earth (medv ~. , data = train )**

We call the summary of the model to see the values of the parameters

**summary (Fit)**

To see the importance of input variables we have to use the function

**evimp()****evimp (Fit)**

Read more details about **GCV** **Wiki Reference**)

We predict the values of the House price
and save it in** Predictions **

**Predictions <- predict(Fit, test)**

Lastly, to see the accuracy of the model
for the test dataset, we use the function **rmse()**

**library(Metrics)****rmse (test$medv,Predictions_test)**

**CONCLUSION**

From the rmse value it can be concluded that MARSplines algorithm works very well in regressions problems. We can also tell that the model has **worked better than the logistic regression and other penalized regression techniques** **like ridge ,lasso and elastic net**.

[Note: Compare the accuracy of MARS with the accuracies of the event techniques from my previous article: (https://stepupanalytics.com/lasso-and-elastic-net-regression/ )

You can also use the MARSplines algorithm with **mars()** from the ** mda** package : (https://cran.r-project.org/web/packages/mda/mda.pdf )

**NOTE:**

Two important features of MARSplines algorithm :-

- It can be
**applied to multiple dependent variables**. The algorithm determines a common set of basis functions in the predictors, but estimates different coefficients for each dependent variable. - Because MARSplines can handle multiple dependent variables; it is easy to apply the algorithm to classification problems as well.

The post Multivariate Adaptive Regression Splines appeared first on StepUp Analytics.

]]>The post Lasso And Elastic Net Regression appeared first on StepUp Analytics.

]]>For example, it shrinks the coefficients towards zero, but it does not set any of them exactly to zero. It does not perform feature selection and etc. So in this article, I have introduced two new methods such as lasso and elastic net regression which deals with these issues very well and does both variable selection and regularization.

**Lasso** (or **least absolute shrinkage and selection operator**) is a regression analysis method that follows the L1 regularization and penalizes the absolute size of the regression coefficients similar to ridge regression. In addition; it is capable of reducing the variability and improving the accuracy of linear regression models. Lasso regression differs from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing the regression coefficients for which some of the parameter estimates turn out exactly zero. Hence, much like the best subset selection method, lasso performs variable selection out of the given n variables.

The tuning parameter lambda is chosen by cross-validation. When lambda is small, the result is essentially the least squares estimates (OLS). As lambda increases, shrinkage occurs and the less important feature’s coefficient shrinks to zero thus, removing some feature altogether.

So, a major advantage of lasso is that it is a combination of both shrinkage and selection of variables. In cases of a very large number of features, lasso allows us to efficiently find the sparse model that involves a small subset of the features.

The cost function is given below, where the highlighted part is the L1 regularization.

The method was proposed by Professor Robert Tibshirani from the University of Toronto, Canada. He said, “The Lasso minimizes the residual sum of squares to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint, it tends to produce some coefficients that are exactly 0 and hence gives interpretable models”.

In his article titled Regression Shrinkage and Selection via the Lasso*, *Tibshirani tells us about this technique with respect to various other statistical models such as subset selection and ridge regression. He goes on to say that “lasso can even be extended to generalized regression models and tree-based models. In fact, this technique provides possibilities for even conducting statistical estimations.”

Traditional methods like cross-validation, stepwise regression to handle overfitting and perform feature selection works well with a small set of features but penalized regression techniques are a great alternative when we are dealing with a large set of features.

Lasso was originally formulated for least squares models and this simple case reveals a substantial amount about the behavior of the estimator, including its relationship to ridge regression and best subset selection and the connections between lasso coefficient estimates and so-called soft thresholding. It also reveals that (like standard linear regression) the coefficient estimates need not be unique if covariates are collinear. [Source: Wikipedia]

Though originally defined for least squares, lasso regularization is easily extended to a wide variety of statistical models including generalized linear models, generalized estimating equations, proportional hazards models, and M-estimators, in a straightforward fashion. Lasso’s ability to perform subset selection relies on the form of the constraint and has a variety of interpretations including in terms of geometry, Bayesian statistics, and convex analysis. [Source: Wikipedia]

As discussed above, lasso can set coefficients to zero, while ridge regression, which appears superficially similar, but cannot. This is due to the difference in the shape of the constraint boundaries in the two cases.

From the figure, one can see that the constraint region of lasso regression is a rotated square and its corners lie on the axes, while the constraint region of ridge regression is a sphere which is rotationally invariant and, therefore, has no corners. A convex object that lies tangent to the boundary, is likely to encounter a corner a hypercube, for which some components of are identically zero, while in the case of a sphere, the points on the convex object boundary for which some of the components are not distinguished from the others and the convex object is not likely to contact a point at which some components are zero.

In the case of ML, both ridge regression and Lasso find their respective advantages. Both these techniques tackle overfitting, which is generally present in a realistic statistical model. It all depends on the computing power and data available to perform these techniques on statistical software. Ridge regression is faster compared to lasso but then again lasso has the advantage of completely reducing unnecessary parameters in the model.

One important limitation of lasso regression is that, for grouped variables, the lasso fails to do grouped selection. It tends to select one variable from a group and ignore the others.

Elastic-net is a mix of **both L1 and L2 regularizations**. A penalty is applied to the sum of the absolute values and to the sum of the squared values:

Lambda is a shared penalization parameter while alpha sets the ratio between L1 and L2 regularization in the Elastic Net Regularization. Hence, we expect a hybrid behavior between L1 and L2 regularization. Though coefficients are cut, the cut is less abrupt than the cut with lasso penalization alone. The hyper-parameter is between 0 and 1 and controls how much L2 or L1 penalization is used. The usual approach to optimizing the lambda hyper-parameter is through cross-validation—by minimizing the cross-validated mean squared prediction error—but in elastic net regression, the optimal lambda hyper-parameter also depends upon the alpha hyper-parameter.

This article takes a cross-validated approach that uses the grid search to find the optimal alpha hyper-parameter while also optimizing the lambda hyper-parameter for the data set.

In my previous article, I used the **glmnet **package to show the ridge regression in R. In this article, I have used the **caret** package for better comparison between the techniques.

Loading the **MASS** package to get the data set

**library (MASS)
**

Splitting the dataset in training and testing data

**train <- data [1:400,]
**

**Setting up a grid range of lambda values
**

Loading the required libraries

**library (tidyverse)
**

We fit the **ridge regression model** on the training data using k fold cross validation

**set.seed (123)
**

**plot (ridge$finalModel , xlab = “L2 Norm” )**

Displaying the regression coefficients below

**coef (ridge$finalModel, ridge$bestTune$lambda)**

We save the predicted values of the response variable in a vector **prediction_ridge
**

Saving the RMSE, SSE and MAPE values in** Accuracy_lasso
**

The only difference between the R code used for ridge and lasso regression is that for lasso regression, we need to specify the argument **alpha = 1** instead of **alpha = 0** (for ridge regression).

Now executing the **Lasso Regression**

**set.seed (123)
**

**plot (lasso$finalModel , xlab = “L1 Norm” )**

If we look at the plot, the x-axis is the *maximum permissible value the L1 norm can take*. So when we have a small L1 norm, we have a lot of regularization. Therefore, an L1 norm of zero gives an empty model, and as you increase the L1 norm, variables will “enter” the model as their coefficients take non-zero values.

Displaying the regression coefficients below

**coef (lasso$finalModel, lasso$bestTune$lambda)**

We save the predicted values of the response variable in a vector **prediction_ lasso
**

Saving the RMSE, SSE and MAPE values in** Accuracy_lasso**

**Accuracy_lasso <-data.frame (
**

The **elastic net regression models** do not require us to mention a specific value of lambda and alpha. We use **caret **package** **to automatically select the best tuning parameters alpha and lambda. The **caret** package tests a range of possible alpha and lambda values, and then selects the best values for lambda and alpha, resulting in a final model that is an elastic net model.

Now executing the **Elastic Net Regression**

**set.seed (123)
**

Displaying the regression coefficients below

**coef (elasticnet$finalModel, elasticnet$bestTune$lambda)**

We save the predicted values of the response variable in a vector **prediction_ elasticnet
**

Saving the RMSE, SSE and MAPE values in** Accuracy_ elasticnet
**

We finally bring the RMSE, SSE and MAPE values of the three regression techniques in a dataframe** Accuracy.
**

Here both lasso and elastic net regression do a great job of feature selection technique in addition to the shrinkage method. On the other hand, the lasso achieves poor results in accuracy. This is because there is a high **degree of collinearity in the features**. Further, the L1 norm is underdetermined when the number of predictors exceeds the number of observations while ridge regression can handle this.

From our example we see that penalized regression models performed much better than the multiple linear regression models. But it can be said that Lasso regression performs better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, is a hybrid of the two, and performs well in all these scenarios.

The post Lasso And Elastic Net Regression appeared first on StepUp Analytics.

]]>The post A Refresher on Regression Analysis appeared first on StepUp Analytics.

]]>However, if we are given the optimum combinations of these predictor variables, we can build a model for the crop yield which can be used to predict the required crop yield (model builds data).

While examining a patient, the dosage is set keeping in mind his other illnesses and previous medical records such as blood sugar level, cholesterol, eyesight, etc. Here dosage can be considered as some dependent variable and his other illnesses, medical records are considered as independent variables.

Such a relationship, when a dependent variable needs to be measured considering all other independent variables is expressed through terms like correlation and regression.

In simple terms, regression helps us to predict or analyze relationships between two or more variables. The factor being predicted is known as a dependent variable and the factors that are used to predict the values of the dependent variable are called independent variables.

Regression analysis is used to do the same. For example, you might guess that there is a connection between how much you eat and how much you weigh, regression analysis can help you quantify that. Regression analysis will give us an equation for a graph so that we can make predictions about our data.

In statistics, some random numbers lying in a table make little sense to us. To make sense out of it, we can use regression and obtain some inferences about the future performance of the given random variable.

Suppose you’re a sales manager trying to predict next month’s numbers. You know that dozens, perhaps

even hundreds of factors from the weather to a competitor’s promotion to the rumor of a new and

the improved model can impact the number.

Perhaps people in your organization even have a theory about what will have the biggest effect on sales. “Trust me. The more rain we have, the more we sell.” “Six weeks after the competitor’s promotion, sales jump.”

Regression analysis is a way of mathematically sorting out which of those variables does indeed have an

impact. It answers the questions: Which factors matter most? Which can we ignore? How do those factors interact with each other? And, perhaps most importantly, how certain are we about all of these factors?

The best way to understand linear regression is to relive the experience of childhood. If you ask a class fifth child to arrange people in his class by increasing order of weight, without asking them their weights, the child would likely look (visually analyze) at the height and build of the classmates and arrange them using a combination of these visible parameters.

The child has actually figured out that height and build would be correlated to the weight by a linear relationship. This is the linear regression in real life!

In simple terms, simple linear regression is predicting the value of a variable Y (the dependent variable)

based on some variable X (the independent variable) provided there is a linear relationship between the

variables X and Y.

If there are more than one independent variables, then we can predict the value of Y using Multiple Linear Regression. For example, when we predict rent based on square feet alone, then we can use simple linear regression, but when we predict the rent based in square feet and age of the building, then we will use multiple linear regression.

The linear relationship between the two variables can be represented by a straight line, called the **regression line**.

Now to determine if there is a linear relationship between two variables, we can simply plot the scatter plot (plotting of the coordinates (x,y) on a graph) of variable Y with variable X. If the plotted points are randomly scattered then it can be inferred that the variables are not related.

**There is a linear relationship between the variables.**

If there are points lying in a straight line, then there exists a linear relationship between the variables.

After drawing a straight line through the points plotted, we will find that not all the points lie on the line. This happens because the line that we have drawn may not be the best fit and the points plotted are probabilistic, i.e., our observations are approximate.

But, when there exists a linear relationship between X and Y, then we can plot more than one line through these points. How do we know which one is the best fit?

To help us choose the line of best fit, we use the method of least squares.

Least Squares

This is the mathematical relationship between the variables X and Y where,

X is the independent variable

Y is the dependent variable

𝑏𝑜 is the intercept of the regression line

𝑏1 is the slope of the regression line

e is the error or deviation from the actual/ observed variable of the variable Y

Here, is the difference between the ith observed value and the ith calculated value. This error could be positive or negative. We have to minimize this error to get the line of best fit. On minimizing the error sum of squares, we obtain the values of 𝑏𝑜 and 𝑏1 using the two normal equations

And

Then, we find the values of 𝑦𝑖 for the given values of 𝑥𝑖 and plot the line of best fit.

**R Code for Simple Linear Regression**

We use the *lm()* function to create a relationship model (between the predictor and the response variable). The basic syntax for *lm()* function is:

lm(formula,data)

formula – the symbol for presenting the relationship between x and y

data – vector on which the formula will be applied

**For example:**

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) print(relation)

So, the final code becomes:

#Load Train and Test datasets #Identify feature and response variable(s) and values must be numeric and numpy arrays x_train <- input_variables_values_training_datasets y_train <- target_variables_values_training_datasets x_test <- input_variables_values_test_datasets x <- cbind(x_train,y_train) # Train the model using the training sets and check score linear <- lm(y_train ~ ., data = x) summary(linear) #Predict Output predicted= predict(linear,x_test)

**For example:**

# The predictor vector. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # The response vector. y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) # Find weight of a person with height 170. a <- data.frame(x = 170) result <- predict(relation,a) print(result)

Multiple regression analysis is almost the same as simple linear regression. The only difference between

simple linear regression and multiple linear regression is in the number of independent ( or predictor)

variables used in the regression.

- Simple linear regression analysis uses a single X variable for each dependent Y variable. For example (𝑥𝑖𝑦𝑖).
- Multiple Regression uses multiple X variables for each dependent variable Y. For example (𝑥1, 𝑥2, 𝑥3, 𝑦𝑖).

For example, if we want to find out if weight, height, and age of the people explain the variance in their cholesterol levels, then multiple regression will come to our rescue. We may take weight, height, and age as independent variables 𝑥1, 𝑥2 𝑎𝑛𝑑 𝑥3 and cholesterol as our dependent variable.

**Assumptions:**

- Regression residuals (or error term) must be normally distributed.
- A linear relationship is assumed between the dependent and the independent variable.
- The error terms are homoscedastic and approximately rectangular shaped.
- The independent variables are not too highly correlated with each other.

There are three major uses of multiple regression analysis. First, it can be used to forecast effects or impacts of changes in the future, i.e., it helps us to understand how much will the dependent variable change when we change the independent variables. For example, we can use multiple regression to find how much GPA is expected to increase (or decrease) for every one point increase (or decrease) in IQ.

Also, it can be used to identify the strength of the effect that the independent variable has on a dependent variable.

Lastly, multiple linear regression analysis predicts future values. It can be used to get point estimates. For example, to know what factors affect the crop yield the most, multiple regression analysis can be used.

First, we will plot to scatter plots of every independent variable with the dependent variable. These scatter plots will help us understand the direction and correlation among the variables.

In the first plot, we see a positive correlation between the dependent and the independent variable.

Whereas, in the second plot, we see an arch-like curve. This indicates that a regression line might not be the best way to explain the data, even if the correlation between them is positive.

The second step of multiple linear regression is to formulate the model, i.e. that variables X1, X2 and X3 have a casual influence on variable Y and that their relationship is linear.

The last step is to fit the regression line.

The multiple linear regression equation is given as:

Proceeding in the same way as above to find the constants 𝑏𝑜, 𝑏1,… , 𝑏𝑚 and then obtaining the values of 𝑦𝑖

for the given values of 𝑥𝑖

Then, we plot the corresponding coordinates and draw the lines of best fit for each combination of independent and dependent variables.

Here, 𝑏0 is the intercept and 𝑏1,…., 𝑏𝑚 are regression coefficients. They can be interpreted the same way as slop. Thus, if 𝑏𝑖 =2.5, it would indicate that Y will increase by 2.5units if Xi increases by 1 unit.

If 𝑏𝑖 is more, then Y is more related to Xi, otherwise, it is less correlated.

**R Code for Multiple Linear Regression**

The code for multiple regression is similar to that of simple linear regression. We consider the following example to understand the code

Consider the data set “mtcars” available in the R environment. It gives a comparison between different car models in terms of mileage per gallon (mpg), cylinder displacement(“disp”), horsepower(“hp”), weight of the car(“wt”) and some more parameters.

The goal of the model is to establish the relationship between “mpg” as a response variable with “disp”,”hp” and “wt” as predictor variables. We create a subset of these variables from the mtcars dataset for this purpose.

input <- mtcars[,c("mpg","disp","hp","wt")] # Create the relationship model. model <- lm(mpg~disp+hp+wt, data = input) # Show the model. print(model) # Get the Intercept and coefficients as vector elements. cat("# # # # The Coefficient Values # # # ","n") a <- coef(model)[1] print(a) Xdisp <- coef(model)[2] Xhp <- coef(model)[3] Xwt <- coef(model)[4] print(Xdisp)

print(Xhp) print(Xwt)

We use the coefficient values, we create the mathematical equation

Y = a+Xdisp.x1+Xhp.x2+Xwt.x3

We can use the regression equation created above to predict the mileage when a new set of values for

displacement, horsepower and weight are provided.

For a car with disp = 221, hp = 102 and wt = 2.91 the predicted mileage is −

Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104

The post A Refresher on Regression Analysis appeared first on StepUp Analytics.

]]>The post Beginner to Advance level – Steps to Make Regression Model appeared first on StepUp Analytics.

]]>In this article, we will learn the steps to make the Regression Model. In the previous article of this series, we learned how to calculate the values of coefficients, a test of slope coefficients and Hypothesis.

Let us continue where we left out

Here in this article, we will learn about:

- ANOVA
- Coefficient of Determination

Let’s start with ANOVA:

A basic idea about ANOVA, that of partitioning variation, is a fundamental idea of the experimental idea of experimental statistics. The ANOVA belies its name in that it is not concerned about analyzing variances but rather with analyzing the variances of mean.

There are two types of ANOVA:

- One way ANOVA
- Two way ANOVA

I have explained One way and Two way ANOVA respectively.

Now let’s discuss **Coefficient Of Determination**

The coefficient of determination denoted by R² or r² and pronounced as R-squared, it is a ratio of the sum of squared.

R² or r²=SS(reg)/SS(t)

*R²*is a statistic that will give some information about the goodness of fit of a model.- R², the coefficient of determination measure of how good is the relationship between the dependent and independent variable.
*R²*lies between [0,1].- An
*R²*of 1 indicates that there is a 100% relationship between variables. - If R² = 0.8 explain 80% variability between variables.
- An
*R²*of 0 indicates that there is no relationship between the variables. - R² does not tell you that the independent variable is the cause of change independent variable.
- R² does not tell you whether the correct regression model was used.

R² increase or decrease on adding of any extra regressor variable, so we can not much dependent on R².

If this isn’t a solution then there might be another way to find the coefficient of determination of model. Yes, there is a solution known as Adjusted R².

The above properties for R² and Adjusted R² will remain the same.

- The adjusted
*R*^{2}can be negative, and its value will always be less than or equal to that of*R*^{2}. - The adjusted
*R*^{2}increases only when the increase in*R*^{2}(due to the addition of a new regressor variable)

The adjusted *R*^{2} is defined as

where

*p*is the total number of regressor variables in the model (not including the constant term)*n*is the sample size.

Adjusted *R*^{2} can also be written as

where

- df
_{t}is the total degrees of freedom. -
*n*– 1 of the estimate of the population variance of the dependent variable. - df
_{e}is the degrees of freedom of the regression model. *n*–*p*– 1 of the estimate of the underlying population error variance.

Next is Model Adequacy checking, Multicollinearity and selecting significant explanatory variables.

We will discuss these remaining topics in the next article of this series. Till then, if you have any doubt or suggestion please feel free to shoot me an email on khanirfan.khan21@gmail.com or mention in the comment.

Article originally posted

The post Beginner to Advance level – Steps to Make Regression Model appeared first on StepUp Analytics.

]]>The post Beginner to Advance level: Steps to Make Regression Model appeared first on StepUp Analytics.

]]>You must have heard about Regression Models many times but you might not have heard about the techniques of solving or making a regression model step-wise.

First, we will talk about **Simple Linear Regression**: is a model with a single regressor (x) has a linear relationship with a response variable (y).

We all(who have an idea about regression) the linear regression equation:

For a given x, the corresponding observation Y consists of the value. Check above

We know to make some assumptions on the model.

**Assumptions:- **

**ε(i)**is a random variable with**mean**zero &**variance σ^2(sigma square)**[σ^2 is unknow]- i.e.
**E(ε(i)) = 0 & V(ε(i)) = σ^2**{E<- expectation}

- i.e.
**ε(i)**and**ε(j)**are uncorrelated**i≠j**. So**cov(ε(i), ε(j)) = 0.**Here uncorrelated means independent to each other**ε(i)**is a normally distributed random variable, with mean zero and variance**σ^2**- ε(i)=N(0,σ^2) {N – indicates random distributed, σ^2 – variance followed by mean = zero}

**Assumptions in terms of Y[ (ε(Y))] :- **Here i am not going in details to write the equation, I will tell you what to do just replace the **ε(i) to ε(Y).**

**And Mean will become = β(0)+β(1)*x and variance will e same and equal to σ^2**

Here is the **Least Square Estimation of the Parameter** we are going to discuss further.

**Least Squares Estimation(LSE):- **

- The parameters
**β(0), β(1)**are unknown and must be estimated using same data.

(x1,y1),(x2,y2), – – – – , (xn,yn)

- The line fitted by (
**LSE**) is the one that makes the sum of squares of all**Vertical discrepancies**as**small as possible.**

- We estimate
**β(0), β(1)**so that the sum of the square of all the difference between observation Y(i) and the fitted line is**minimum = SS(Res)**, explained in the below snapshot - The least square estimator of
**β(0)& β(1), (β_0 ̂,β_1 ̂ )**must satisfy the following two equation a snapshot is addedEquation 1 and 2 are called normal equations and they are uniquely independent.

So, the estimator **β_0 ̂,β_1 ̂ **is the solution of the equation

**∑(Y(i)-β0ˆ-βiˆ*xi) = 0** —–(1),

** ∑(Y(i)-β0ˆ-βiˆ*xi)*xi = 0** —-(2)

The solution of the above equations are attached as the image, I am attaching the image here because I can’t write mathematical equation here, So enjoy snapshots

Equation Solution:

Equation 2 solution:

**Above we have calculated the parameters using least squares estimator.**

We have not discussed the benefits of using **LSE**, in one line the most important benefits of using LSE is the** solution will be most correct with almost 95% accuracy.**

**Properties of Least Square Estimator:-**

- Sum of residuals in any regression model that contains an intercept β_0 is always 0.

**∑y(i) = ∑(y(i)−y_iˆ) = 0 **(perfect regression line)

**∑y(i) = ∑(y_iˆ)**means the observation and estimated line of regression graph lie on each other(perfect regression line)**∑x(i)*e(i) = 0****∑y(i)*e(i) = 0**

**Statistical properties of LS estimation:- **

- Both (β_0)ˆ and (β_1)ˆ are unbiased estimator of (β_0) and (β_1) respectively. Which means they should be equal in values.
- (β_0)ˆ and (β_1)ˆ are linear combination of observation of y(i)

**β_1ˆ= [∑(x(i)-mean(x))*(y(i)-mean(y))]/∑((x(i)-mean(x))^2**

**=** [**∑(x(i)-mean(x))*y(i)]/∑(x(i)-mean(x))^2]**

**β_1ˆ** **= ∑(e(i)*y(i)**

Similarly we can also do for **β_0ˆ**

**β_0ˆ = mean(y)−β_1ˆ*mean(x)**

**= (1/n)∑y(i) – β_iˆ.(mean(x))**

Take the value of beta 1 parameter from above equation.

**Note:- I am not going to prove this, if you proof need please message me @ irrfankhann29@gmail.com i will personally send my documents.**

Similarly we will calculate the **variance** of **beta_0 and beta_1**

**v(β_1ˆ) = [σ^2/S(xx)] :where S(xx) = ∑[(x_i – mean(x)]^2**

**v(β_0ˆ) = σ^2[(1/n)+{mean(x)^2}/S(xx)]**

**NOTE:-β_0 & β_1 are unbiased estimator of σ^2**

**Estimation of σ^2:- **is obtained from the residual sum of square

**SS(res) = SS(yy) – (β_1ˆ)^2.**

We got the values of Coefficients, sum of squared errors (Regressor, Regression and total) using these we can calculate the **null hypothesis **which is based on **t** and **z test.**

The t and z test value can be calculated using this formula:

Usually varinace(σ²) is unknow, if variance(σ²) is unknow then we will follow t-test hypothesis

**t = (β_1ˆ-β_1)/√(MS_res)/S(xx)**

if |t|>t[(α/2), (n-2)] we reject null hypothesis.

∴ |t| is calculated value and t[(α/2), (n-2)] is tabulated value.

And when (σ²) knows we will follow z-test hypothesis

**z = (β_1ˆ-β_1)/√(σ²)/S(xx) which follow random normal distribution(0,1)**

if |z|>z(α/2) we reject the null hypothesis.

∴ |z| is calculated value and z(α/2) is tabulated value.

Now we have all the values to calculate ANOVA table which will describe in the next article, so stay tuned.

Queries or Docs/ notes related please shoot me an email on khanirfan.khan21@gmail.com.

The post Beginner to Advance level: Steps to Make Regression Model appeared first on StepUp Analytics.

]]>The post R FUNCTIONS FOR REGRESSION ANALYSIS appeared first on StepUp Analytics.

]]>Here are some helpful R functions for regression analysis grouped by their goal. The name of the package is in parentheses.

__Linear model __

**Anova:** Anova Tables for Linear and Generalized Linear Models (car)

**anova:** Compute an analysis of variance table for one or more linear model fits (stasts)

**coef:** is a generic function which extracts model coefficients from objects returned by modelling functions. coefficients is an alias for it (stasts)

**coeftest:** Testing Estimated Coefficients (lmtest)

**confint:** Computes confidence intervals for one or more parameters in a fitted model. The base has a method for objects inheriting from class “lm” (stasts)

**deviance:** Returns the deviance of a fitted model object (stats)

**effects:** Returns (orthogonal) effects from a fitted model, usually a linear model. This is a generic function, but currently only has methods for objects inheriting from classes “lm” and “glm” (stasts)

**fitted:** is a generic function which extracts fitted values from objects returned by modeling functions fitted. Values is an alias for it (stasts)

**formula:** provide a way of extracting formulae which have been included in other objects (stasts)

linear.hypothesis: Test Linear Hypothesis (car)

**lm:** is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (stasts)

**model.matrix:** creates a design matrix (stasts)

**predict:** Predicted values based on the linear model object (stasts)

residuals: is a generic function which extracts model residuals from objects returned by modelling functions (stasts)

summary.lm: summary method for class “lm” (stats)

**vcov:** Returns the variance-covariance matrix of the main parameters of a fitted model object (stasts)

__Model – Variables selection __

**add1:** Compute all the single terms in the scope argument that can be added to or dropped from the model, fit those models and compute a table of the changes in fit (stats)

**AIC:** Generic function calculating the Akaike information criterion for one or several fitted model objects for which a log-likelihood value can be obtained, according to the formula -2*log-likelihood + k*npar, where npar represents the number of parameters in the fitted model, and k = 2 for the usual AIC, or k = log(n) (n the number of observations) for the so-called BIC or SBC (Schwarz’s Bayesian criterion) (stats)

Cpplot: Cp plot (faraway)

**drop1:** Compute all the single terms in the scope argument that can be added to or dropped from the model, fit those models and compute a table of the changes in fit (stats)

**extractAIC:** Computes the (generalized) Akaike An Information Criterion for a fitted parametric model (stats)

leaps: Subset selection by `leaps and bounds’ (leaps)

**maxadjr:** Maximum Adjusted R-squared (faraway)

**offset:** An offset is a term to be added to a linear predictor, such as in a generalised linear model, with known coefficient 1 rather than an estimated coefficient (stats)

**step:** Select a formula-based model by AIC (stats)

**update.formula:** is used to update model formulae. This typically involves adding or dropping terms, but updates can be more general (stats)

__Diagnostics__

**cookd:** Cook’s Distances for Linear and Generalized Linear Models (car)

cooks.distance: Cook’s distance (stats)

**covratio:** covariance ratio (stats)

**dfbeta:** DBETA (stats)

**dfbetas:** DBETAS (stats)

**dffits:** DFFTITS (stats)

**hat:** diagonal elements of the hat matrix (stats)

**hatvalues:** diagonal elements of the hat matrix (stats)

**influence.measures:** This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models (stats)

**lm.influence:** This function provides the basic quantities which are used in forming a wide variety of diagnostics for checking the quality of regression fits (stats)

**ls.diag:** Computes basic statistics, including standard errors, t- and p-values for the regression coefficients (stats)

**outlier.test:** Bonferroni Outlier Test (car)

**rstandard:** standardized residuals (stats)

**rstudent:** studentized residuals (stats)

**vif:** Variance Inflation Factor (car)

** **

__Graphics__

**ceres.plots:** Ceres Plots (car)

**cr.plots:** Component+Residual (Partial Residual) Plots (car)

**influence.plot:** Regression Influence Plot (car)

**leverage.plots:** Regression Leverage Plots (car)

**panel.car:** Panel Function Coplots (car)

**plot.lm:** Four plots (selectable by which) are currently provided: a plot of residuals against fitted values, a Scale-Location plot of sqrt{| residuals |} against fitted values, a Normal Q-Q plot, and a plot of Cook’s distances versus row labels (stats)

**prplot:** Partial Residual Plot (faraway)

**qq.plot:** Quantile-Comparison Plots (car)

**qqline:** adds a line to a normal quantile-quantile plot which passes through the first and third quartiles (stats)

**qqnorm:** is a generic function the default method of which produces a normal QQ plot of the values in y (stats)

**reg.line:** Plot Regression Line (car)

**scatterplot.matrix:** Scatterplot Matrices (car)

**scatterplot:** Scatterplots with Boxplots (car)

**spread.level.plot:** Spread-Level Plots (car)

__Tests__

**ad.test:** Anderson-Darling test for normality (nortest)

**bartlett.test:** Performs Bartlett’s test of the null that the variances in each of the groups (samples) are the same **(stats) bgtest:** Breusch-Godfrey Test (lmtest) bptest: Breusch-Pagan Test (lmtest)

**cvm.test:** Cramer-von Mises test for normality (nortest)

**durbin.watson:** Durbin-Watson Test for Autocorrelated Errors (car)

**dwtest:** Durbin-Watson Test (lmtest)

**levene.test:** Levene’s Test (car)

**lillie.test:** Lilliefors (Kolmogorov-Smirnov) test for normality (nortest)

**ncv.test:** Score Test for Non-Constant Error Variance (car)

**pearson.test:** Pearson chi-square test for normality (nortest)

**sf.test:** Shapiro-Francia test for normality (nortest)

**shapiro.test:** Performs the Shapiro-Wilk test of normality (stats)

__Variables transformations__

**box.cox:** Box-Cox Family of Transformations (car)

**boxcox:** Box-Cox Transformations for Linear Models (MASS)

**box.cox.powers:** Multivariate Unconditional Box-Cox Transformations (car)

**box.tidwell:** Box-Tidwell Transformations (car)

**box.cox.var:** Constructed Variable for Box-Cox Transformation (car)

__Ridge regression__

**lm.ridge:** Ridge Regression (MASS)

** **

__Segmented regression__

**segmented:** Segmented relationships in regression models (segmented)

**slope.segmented:** Summary for slopes of segmented relationships (segmented)

__Generalized Least Squares (GLS)__

**ACF.gls:** Autocorrelation Function for gls Residuals (nlme)

**anova.gls:** Compare Likelihoods of Fitted Objects (nlme)

**gls:** Fit Linear Model Using Generalized Least Squares (nlme)

**intervals.gls:** Confidence Intervals on gls Parameters (nlme)

**lm.gls:** fit Linear Models by Generalized Least Squares (MASS)

**plot.gls:** Plot a gls Object (nlme)

**predict.gls:** Predictions from a gls Object (nlme)

**qqnorm.gls:** Normal Plot of Residuals from a gls Object (nlme)

**residuals.gls:** Extract gls Residuals (nlme) summary.gls: Summarize a gls Object (nlme)

__Generalized Linear Models (GLM)__

**family:** Family objects provide a convenient way to specify the details of the models used by functions such as glm (stats)

**glm.nb:** fit a Negative Binomial Generalized Linear Model (MASS)

**glm:** is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution (stats)

**polr: ** Proportional Odds Logistic Regression (MASS)

__Non-linear Least Squares (NLS)__

**nlm:** This function carries out a minimization of the function f using a Newton-type algorithm (stats)

**nls:** Determine the nonlinear least-squares estimates of the nonlinear model parameters and return a class nls object (stats)

**nlscontrol:** Allow the user to set some characteristics of the nls nonlinear least squares algorithm (stats)

**nlsModel:** This is the constructor for nlsModel objects, which are function closures for several functions in a list. The closure includes a nonlinear model formula, data values for the formula, as well as parameters and their values (stats)

__Generalized Non-linear Least Squares (GNLS)__

**coef.gnls:** Extract gnls Coefficients (nlme)

**gnls:** Fit Nonlinear Model Using Generalized Least Squares (nlme)

**predict.gnls:** Predictions from a gnls Object (nlme)

__Loess regression____ __

**loess:** Fit a polynomial surface determined by one or more numerical predictors, using local fitting (stats)

**loess.control:** Set control parameters for loess fits (stats)

**predict.loess:** Predictions from a loess fit, optionally with standard errors (stats)

**scatter.smooth:** Plot and add a smooth curve computed by loess to a scatter plot (stats)

__Splines regression__

**bs:** B-Spline Basis for Polynomial Splines (splines)

**ns:** Generate a Basis Matrix for Natural Cubic Splines (splines)

**periodicSpline:** Create a Periodic Interpolation Spline (splines)

**polySpline:** Piecewise Polynomial Spline Representation (splines)

**predict.bSpline:** Evaluate a Spline at New Values of x (splines)

**predict.bs:** Evaluate a Spline Basis (splines)

**splineDesign:** Design Matrix for B-splines (splines)

**splineKnots:** Knot Vector from a Spline (splines)

**splineOrder:** Determine the Order of a Spline (splines)

__Robust regression__

**lqs:** Resistant Regression (MASS)

**rlm: ** Robust Fitting of Linear Models (MASS)

** **

__Structural equation models__

**sem:** General Structural Equation Models (sem)

**tsls:** Two-Stage Least Squares (sem)

** **

__Simultaneous Equation Estimation____ __

**systemfit:** Fits a set of linear structural equations using Ordinary Least Squares (OLS), Weighted Least Squares (WLS), Seemingly Unrelated Regression (SUR), TwoStage Least Squares (2SLS), Weighted Two-Stage Least Squares (W2SLS) or Three-Stage Least Squares (3SLS) (systemfit)

__Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR)____ __

**biplot.mvr:** Biplots of PLSR and PCR Models (pls)

**coefplot:** Plot Regression Coefficients of PLSR and PCR models (pls)

**crossval:** Cross-validation of PLSR and PCR models (pls)

**cvsegments:** Generate segments for cross-validation (pls)

**kernelpls.fit:** Kernel PLS (Dayal and MacGregor) (pls)

**msc:** Multiplicative Scatter Correction (pls)

**mvr:** Partial Least Squares and Principal Components Regression (pls)

**mvrCv:** Cross-validation (pls)

**oscorespls.fit:** Orthogonal scores PLSR (pls)

**predplot:** Prediction Plots (pls)

**scoreplot:** Plots of Scores and Loadings (pls)

**scores:** Extract Scores and Loadings from PLSR and PCR Models (pls)

**svdpc.fit:** Principal Components Regression (pls)

**validationplot:** Validation Plots (pls)

__Quantile regression__

**anova.rq:** Anova function for quantile regression fits (quantreg)

**boot.rq:** Bootstrapping Quantile Regression (quantreg)

**lprq:** locally polynomial quantile regression (quantreg)

**nlrq:** Function to compute nonlinear quantile regression estimates (quantreg)

**qss:** Additive Nonparametric Terms for rqss Fitting (quantreg)

**ranks:** Quantile Regression Ranks (quantreg)

**rq:** Quantile Regression (quantreg)

**rqss:** Additive Quantile Regression Smoothing (quantreg)

**rrs.test:** Quantile Regression Rankscore Test (quantreg)

**standardize:** Function to standardize the quantile regression process (quantreg)

__Linear and nonlinear mixed effects models____ __

**ACF:** Autocorrelation Function (nlme)

**ACF.lme:** Autocorrelation Function for lme Residuals (nlme)

**anova.lme:** compare Likelihoods of Fitted Objects (nlme)

**fitted.lme:** Extract lme Fitted Values (nlme)

**fixed.effects:** Extract Fixed Effects (nlme)

**intervals:** Confidence Intervals on Coefficients (nlme)

**intervals.lme:** Confidence Intervals on lme Parameters (nlme)

**lme:** Linear Mixed-Effects Models (nlme)

**nlme:** Nonlinear Mixed-Effects Models (nlme)

**predict.lme:** Predictions from an lme Object (nlme)

**predict.nlme:** Predictions from an nlme Obj (nlme)

**qqnorm.lme:** Normal Plot of Residuals or Random Effects from an lme object (nlme)

**random.effects:** Extract Random Effects (nlme)

**ranef.lme:** Extract lme Random Effects (nlme)

**residuals.lme:** Extract lme Residuals (nlme)

**simulate.lme:** simulate lme models (nlme)

**summary.lme:** Summarize an lme Object (nlme)

**glmmPQL:** fit Generalized Linear Mixed Models via PQL (MASS)

__Generalized Additive Model (GAM)__

**anova.gam:** compare the fits of a number of gam models (gam)

**gam.control:** control parameters for fitting gam models (gam)

**gam:** Fit a generalized additive model (gam)

**na.gam.replace:** a missing value method that is helpful with gams (gam)

**plot.gam:** an interactive plotting function for gams (gam)

**predict.gam:** make predictions from a gam object (gam)

**preplot.gam:** extracts the components from a gam in a plot-ready form (gam)

**step.gam:** stepwise model search with gam (gam) summary.gam: summary method for gam (gam)

__Survival analysis____ __

**anova.survreg:** ANOVA tables for survreg objects (survival)

**clogit:** Conditional logistic regression (survival)

**cox.zph:** Test the proportional hazards assumption of a Cox regression (survival)

**coxph:** Proportional Hazards Regression (survival)

**coxph.detail:** Details of a cox model fit (survival)

**coxph.rvar:** Robust variance for a Cox model (survival)

**ridge:** ridge regression (survival)

**survdiff:** Test Survival Curve Differences (survival)

**survexp:** Compute Expected Survival (survival)

**survfit:** Compute a survival Curve for Censored Data (survival)

**survreg:** Regression for a parametric survival model (survival)

__Classification and Regression Trees __

**cv.tree:** Cross-validation for Choosing tree Complexity (tree)

**deviance.tree:** Extract Deviance from a tree Object (tree)

**labels.rpart:** Create Split Labels for an rpart Object (rpart)

**meanvar.rpart:** Mean-Variance Plot for an rpart Object (rpart)

**misclass.tree: ** Misclassifications by a Classification tree (tree)

**na.rpart: ** Handles Missing Values in an rpart Object (rpart)

**partition.tree: ** Plot the Partitions of a simple Tree Model (tree)

**path.rpart:** Follow Paths to Selected Nodes of an rpart Object (rpart)

**plotcp:** Plot a Complexity Parameter Table for an rpart Fit (rpart)

**printcp:** Displays CP table for Fitted rpart Object (rpart)

**prune.misclass:** Cost-complexity Pruning of Tree by error rate (tree)

**prune.rpart:** Cost-complexity Pruning of an rpart Object (rpart)

**prune.tree: ** Cost-complexity Pruning of tree Object (tree)

**rpart:** Recursive Partitioning and Regression Trees (rpart)

**rpconvert: ** Update an rpart object (rpart)

**rsq.rpart:** Plots the Approximate R-Square for the Different Splits (rpart)

**snip.rpart:** Snip Subtrees of an rpart Object (rpart)

**solder:** Soldering of Components on Printed-Circuit Boards (rpart)

**text.tree:** Annotate a Tree Plot (tree)

**tile.tree: ** Add Class Barplots to a Classification Tree Plot (tree)

**tree.control: ** Select Parameters for Tree (tree)

**tree.screens: ** Split Screen for Plotting Trees (tree)

**tree: ** Fit a Classification or Regression Tree (tree)

__Beta regression__

**betareg:** Fitting beta regression models (betareg)

**plot.betareg:** Plot Diagnostics for a betareg Object (betareg)

**predict.betareg:** Predicted values from beta regression model (betareg)

**residuals.betareg:** Residuals function for beta regression models (betareg)

**summary.betareg: ** Summary method for Beta Regression (betareg)

The post R FUNCTIONS FOR REGRESSION ANALYSIS appeared first on StepUp Analytics.

]]>