# Multivariate Analysis Of Variance Or MANOVA

**Introduction**

In ANOVA we examine if there is any statistically significant effect of independent variables on a continuous dependent variable using the sum of squares. But here we only have one dependent variable. It’s very simple, but in practical life the problems are complex. So, we can have more than one dependent variable. We can use ANOVA for every dependent variable separately, but using Multivariate Analysis Of Variance Or MANOVA you can do that in one analysis.

So, we can think of MANOVA as a multivariate extension of ANOVA. This way MANOVA explains how much variability of dependent variables is explained by the independent variables simultaneously.

**Table of Content**

- Assumptions
- Dataset and descriptions
- Objective
- Visualization using R
- Perform MANOVA in R and compare with ANOVA
- When MANOVA should be used
- Limitation of MANOVA

**Assumptions**

In MANOVA there also some assumptions, like ANOVA. Before performing MANOVA we have to check the following assumptions are satisfied or not.

- The samples, while drawing, should be independent of each other.
- The dependent variables are continuous in nature and the independent variables are categorical.
- The dependent variables should follow a multivariate normal distribution.
- The population variance-covariance matrices of each group are same, i.e. groups are homogeneous.

**Dataset And Descriptions**

Let us take a sample data set to understand how MANOVA works. The dataset can be downloaded from the given link – **Data**

**Data Dictionary**

Data contains 7 columns with 120 observations (variables). The variables of interest are “Temperature” (3 levels), ”N.source” (2 levels), “Optical.density”, “Product.yield”.

**Objective**

Here we want to check whether the different Temperatures or different N.sources have significant effects on Optical density and Product yield. Here we can perform ANOVA separately for Temperature and N.source but this can be done simultaneously in MANOVA.

**Visualization using R**

Now we are going to observe how MANOVA works using graphical representations.

First, the data is loaded into the R environment.

1 2 3 4 |
data <- read.delim("MANOVA.txt",header = T) head(data,n=10) data$Temperature <- as.factor(data$Temperature) |

Now we could have plots as following,

1 2 3 4 5 6 7 8 9 10 |
#For N.source plot(data$Optical.density,data$Product.yield,col = data$N.source,pch = 15, main = "Optical density vs Product yield vs N.source", xlab = "Optical density", ylab = "Product yield") legend("topleft",legend = as.character(levels(data$N.source)),fill = 1:2) #For Temperature plot(data$Optical.density,data$Product.yield,col = data$Temperature,pch = 15, main = "Optical density vs Product yield vs Temperature", xlab = "Optical density", ylab = "Product yield") legend("topleft",legend = as.character(levels(data$Temperature)),fill = 1:3) |

From the above scatter plots we can easily understand that N.source differs significantly to explain the variability of Optical density and Product yield, but Temperature doesn’t significantly differ.

**Perform MANOVA in R and compare with ANOVA**

We can have the same result mathematically simultaneously, as we get from the plots, using MANOVA.

In R we can perform MANOVA as follows,

1 2 3 |
summary(manova(cbind(Product.yield,Optical.density) ~ N.source + Temperature, data = data)) > summary(manova(cbind(Product.yield,Optical.density) ~ N.source + Temperature, data = data)) |

1 2 3 4 5 6 7 |
Df Pillai approx F num Df den Df Pr(>F) N.source 1 0.72282 149.944 2 115 <2e-16 ***0 Temperature 2 0.04278 1.268 4 232 0.2835 Residuals 116 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |

From the output, we can easily see that at 1% level of significance it can be concluded that N.source has significant effects on Product yield and Optical density. But, on the other hand, although we can say at 5% level of significance Temperature has no significant effect in Product yield and Optical density.

So, this is how MANOVA gives us a mathematical result to understand if the effects of some treatments significantly differ to explain the variability of more than two continuous variables simultaneously.

Now, we can have the result of ANOVA as follows,

1 2 3 |
summary.aov(manova(cbind(Product.yield,Optical.density) ~ N.source + Temperature, data = data)) > summary.aov(manova(cbind(Product.yield,Optical.density) ~ N.source + Temperature, data = data)) |

1 2 3 4 5 6 7 8 |
Response Product.yield : Df Sum Sq Mean Sq F value Pr(>F) N.source 1 27579.1 27579.1 187.3099 <2e-16 *** Temperature 2 80.9 40.5 0.2748 0.7602 Residuals 116 17079.6 147.2 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |

1 2 3 4 5 6 7 8 |
Response Optical.density : Df Sum Sq Mean Sq F value Pr(>F) N.source 1 6.7071 6.7071 193.7043 < 2e-16 *** Temperature 2 0.1656 0.0828 2.3915 0.09599 . Residuals 116 4.0166 0.0346 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 |

Here we have the result if we perform ANOVA separately. We get the same result for N.source but we can see at 10% level of significance Temperature differs in the effect of Optical density.

So, basically, in ANOVA, we separately check how significant each predictor variables are, but in MANOVA we analyse the variability of response variables on the multiple predictor variables simultaneously.

**When MANOVA should be used**

We should use MANOVA when there are multiple dependent variables which are correlated. Unlike individual ANOVAs, MANOVA can detect small significant effects. In MANOVA, the effect of those factors which may have an influence on the relationship between two dependent variables can be determined which may be missed if we perform individual ANOVAs.

In individual ANOVAs, type-I error (chance of rejection the true null hypothesis) can be increased, but in MANOVA, all response variables simultaneously keep the error rate equal to the desired level of significance.

**Limitation of MANOVA**

Presence of outlier may increase the type-I error as MANOVA sensitive to outliers. Presence of multicollinearity violates the assumption of MANOVA. If the dependent variables are highly correlated then one can be a linear function of others, which becomes statistically redundant.