# Functional Programming in R Using “Purrr” Package

If you are here you are probably familiar with **R** language(Ever wondered why it called ‘R’, It was initially named as **S** obviously stands for Statistical computing but the authors;** R**oss Ihaka and **R**obert Gentleman named it is as **R** after their first name’s first letter.) and now you(If you are a beginner) are wondering why have I never heard about Purrr package.

“R is a Functional Programming Language which means R provides a set of tools to create and manipulate functions ”

Hadley Wickham and Lionel Henry are the authors of Purrr and it is a part of Tidyverse. Functions are also objects in R they are treated in the same way as Vectors:

create, assign them to objects and later use these objects in other functions and another important feature is passing functions as arguments to other functions: the core part of apply family functions as well as the Purrr package.

What’s the point of using Purrr package since we already have the Apply family functions. BTW If you are not familiar with the apply family then here’s the link.

Passing functions as arguments make ‘R’ users hassle-free. In other words less code and less verbose. Now, Let’s see the first function in the Purrr package called Map.

Map function’s syntax **map (.x,** **.f, …)** is pretty much same as the Apply family functions except mapply. In mapply, the function argument comes first then followed by data arguments.

The first argument **.x** is always a vector followed by **.f** is always a function which is applied to every element of **.x**. These map functions work on the same logic as Apply family functions.

**Application of Purrr Package in R**

Let’s create a Dataframe:

1 2 3 4 5 6 7 |
a <- sample(x =1:10,size = 6) #A sample of 6 elements without replacement! b <- rep(x = sample(x = 1:10,size = 3),each=2) #"rep" Repeats the elements of sample function by two times(each=2). c <- c("India","USA","UK","Australia","China","Canada") d <- rnorm(n = 6)#rnorm generates 6 random numbers from normal distribution with mean=0 and sd=1. df <-data.frame(a,b,c,d,stringsAsFactors = F)#data.frame always shows the character vector(Here'c') as Factors! df |

1 2 3 4 5 6 7 8 |
# Load the Purr Library library(purrr) m_l <- map(df[,c("a","b","d")],mean) # Map function iterates over the columns of the DF # and applies the mean function to each col # Map function always returns a list! m_l |

1 2 3 4 5 6 7 8 9 10 11 12 |
#Base package's lapply is the closest function to perform same operation. l<-lapply(df[,c("a","b","d")],mean) l #If we had passed the entire df without subsetting the Character vector 'c' then #both the functions(map and lapply) return a list with 'c' component as NA. #If the data argument is a Vector: m_v<-map(a,sqrt) m_v #Then map function iterates over each ELEMENT of the vector and applies the function. |

**The significance of … in a map function**

The **… **(dot dot dot) argument in map functions is used to pass the additional arguments to the function **.f **and the dot **(.) **before x and f denote that these argument names are highly unlikely to be the argument names which we pass through **…, **To avoid the confusion: the first two arguments belong to the map function and the rest of them belong to the function that we are mapping.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
#If the data argument is a List: l <-list(a=a,b=b) m_l<-map(l,mean) m_l #Then map function iterates over each COMPONENT of the list and applies the function. #Sigificanc of ... argument in map functions: df1<-df df1$a[1]<-NA df1$a m_ddd<-map(df1,mean,na.rm=TRUE) m_ddd#Warning message is because of the Character vector 'c' in the DF. m_ddd<-map(df1[,c("a","b","d")],mean,na.rm=TRUE)#Wouldn't return a warning message! |

**Why Map over sapply:**

Type-Inconsistent or Unstable functions: The type of return object depends on the input. sapply is a Type-Inconsistent function.

All the **Purrr** functions are Type-Consistent or stable, which means they will always return the type you are expecting regardless of the input.

There are a plethora of map functions depending upon the return type of the object.

- map()returns a list
- map_lgl()returns a logical vector
- map_int()returns an integer vector
- map_dbl()returns a double vector
- map_chr()returns a character vector.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Plethora of map functions m_dbl <- map_dbl(df[,c("a","b","d")],mean) m_dbl m_lgl <- map_lgl(df,is.character) m_lgl m_chr <-map_chr(df,class) m_chr # Choosing a appropriate map function is important: m_int<-map_int(df[, c("a", "b", "d")], median) m_int |

1 2 3 4 5 6 7 8 9 |
# You see the above error because vector 'd' is of type double! class(d) typeof(d) # All these Map functions EITHER return the type that we EXPECT or an ERROR! # So map_int wouldn't work here! m_d1<-map_dbl(df[, c("a", "b","d")], median) m_d1 |

**Ways of specifying .f in map functions and shortcuts for subsetting**

1 2 3 4 5 6 7 8 9 10 11 |
# Ways of specifying .f in map functions: # The function created below is called an Anonymous function or Lamba function. # You may not use few functions very often so by creating an Anonymous function on the fly to save the time. m1<-map(b,function(x) x^2)#conventional way of squaring every element of a vector. m1 # The other cool way of defining an anonymous function is by "formula" shortcut. m2<-map(b, ~(.^2))#Tilda(~) signifies that its a formula and the dot acts as placeholder for data argument'x'. m2 # Purrr is a time saver! |

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Shortcut for subsetting l<- list(list(a=a,b=b),list(a=c,b=d)) l # We can subset all the 'a' elements from a list of lists. # Using sapply s_n<-lapply(l, function(x){x[["a"]]})#By name s_n s_p<-lapply(l,function(x){x[[1]]})#By Position s_p l_n<-map(l, "a")#By Name l_n l_p<-map(l, 1)#By Position l_p |

This is important, The real application of these shortcuts are much more useful in the following scenario:

You built several linear models and you want to compare the “r square” or Accuracy of the same models. So to do that, Save those **models** in a list then by using the shortcuts provided by the map functions.

1 2 3 4 5 6 7 8 9 10 11 12 |
# head(mtcars) # Build 3 models using mtcars dataset to compare R square or Accuracy m <- lm(mpg~wt,mtcars) m1 <- lm(mpg~wt+gear, mtcars)#Add an Independent Variable m2 <- lm(mpg~wt+gear+disp, mtcars)#To compare Accuracy models <- list(m,m1,m2)#save them in a list # Pipe with Purrr saves alot time models %>% map(summary) %>% map_dbl("r.squared") |

1 2 3 4 5 6 7 8 9 10 |
# Purr's Walk library(purrr) map(10,~(plot(rnorm(.)))) # The map function returns a value,but plot doesn't return a value so we see NULL # This is called a "side effect",Purr's Walk function is designed for this purpose # Walk works on the functions that don't return anything. walk(10,~(plot(rnorm(.)))) # Now we only see the plot which we needed in the first place. |

What if we want a function to iterate over 2 arguments, map2 and walk2 to the rescue!

map2(**.x**, **.y**, **.f**, **….**), The syntax is similar to map function but has an additional argument “.y”, which is used to iterate over another object.

The function will take the **.x**‘s first element as the first argument and the **.y**‘s first element as the second argument and so on till the last elements of **.x** and .**y**.

1 2 3 4 5 |
library(purrr) # map2,mapply, walk and walk2 map2(list(mtcars$gear),list(mtcars$mpg),~plot(.x,.y)) # scatter plot of gears vs mpg |

1 2 3 |
mapply(function(x,y){plot(x,y)},x =list(mtcars$gear),y = list(mtcars$mpg)) # mapply is closest function to map2. |

1 2 3 |
# We can get rid of the NULL part by using walk2 walk2(list(mtcars$gear),list(mtcars$mpg),~plot(mtcars$gear,mtcars$mpg)) |

What if we want a function to iterate over 3 or more objects, so for that Purrr provides a function called **pmap** or **map_n** rather than map3,map4 and so on.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# pmap or map_n # rnorm(n=6,m=1,sd=2) # rnorm(n=4,m=2,sd=1) # rnorm(n=2,m=1,sd=1) # The above functions can be written using pmap as following: # Provide the arguments in a nested list or list of lists format. n <- list(6,4,2) mu <- list(1,2,1) sd <- list(2,1,1) p <- pmap(list(n,mu,sd),rnorm)#By default pmap matches the elements in list to the function by position. pmap(list(mu,n,sd),rnorm)# is different than 'p' object. # So it is always a safer way to provide arguments by name in the list |

These are some of the functions of Purrr package, See R’s Documentation on Purrr for some more functions.