50 R Language and R Studio Tips

Things I Wish I Knew When I Started Out With R

Here are 50 R and R Studio tips that we hope will be useful through your journey in R. Hope some of these are new to you and will enhance your R skills.

If you wish to keep up with more of these, follow our post on facebook  where you will find more of these tips, blogs on Analytics and Data Science related updates.

1. Never use as.numeric() to convert a factor variable to numeric, instead use as.numeric(as.character(myFactorVar)).

2. options(show.error.messages=F) turns printing error messages off.

3. Use file.path() to create file paths. It works independent of OS platform.

4. mixedsort() from gtools package sorts strings with embedded numbers so even the numbers are in correct order. This is not achieved by regular sort() function.

5. Use ylim  =  range(myNumericData)  +  10   as an argument in plot() function to set and adjust the Y axis limits in your plot

6. Use las parameter in your plot() to customise the orientation of axis labels. Accepted values are {0, 1, 2, 3} for {parallel to axis, horizontal, perpendicular to axis, vertical}

7. Use memory.limit  (size=2500), where the size is in MB, to manage the maximum memory allocated for R on a Windows machine.

8. Use alarm() to produce a short beep sound at the end of your script to notify that the run has completed.

9. eval(parse(text=paste  (“a  <-­‐  10”))) will create a new variable ‘a’ and assign value 10 to it. It executes your strings as if they are R statements.

10. sessionInfo() gets the version information about current R session and attached or loaded packages.

11. Compute the number of changes in characters required to convert one word to another using adist(word1,  word2).

12. options(max.print=1000000) sets the max no. of lines printable in console. Adjust this if you want to see more lines.

13. Introducing practical and robust anomaly detection in a time series: https://blog.twitter.com/2015/introducing-practical-and-robust-anomalydetection-in-a-time-series

14. Two R sessions running simultaneously is guaranteed to have unique IDs. Get the ID of current R session using Sys.getpid()

15. Remove the names attributes from an R object using the unname() function.

16. Check if two R objects are same with identical(x,y). Use all.equal() to test if values are equal.

17. Use withTimeout() function from R.utils package to interrupt functions if run time exceeds a preset time limit and move to next step.

18. Use dist() to compute the distance between rows of a matrix.

19. Use diff() to calculate lagged and iterated differences of a numeric vector.

20. Turn off printing scientific notation such 1e-5 in output, using options(scipen=999)

21. bagEarth() from earth package performs a bagged MARS (Multivariate Adaptive Regressive Spline)

22. setClass(‘myClass’) will define a new user defined class called ‘myClass”. Use setAs() to further customisation.

23. assign  (“varName”,  10) is a convenient way to create numerous variables, as the var name can be passed as a programmable string.

24. dim(matrix) returns the number of rows and columns.

25. data.matrix() converts a data frame to a numeric matrix. Factors will be converted to appropriate numeric values.

26. Use invisible(..) to suppress printing the output to console. Widely used from within functions.

27. cat(“\014”) clears the R Console in Windows.

28. dir(‘folder  path’) shows the files in ‘folder path’. Works much like the same way as in windows cmd prompt.

29. Make missing values in a factor variable as another category in one-line using: levels(Var)  <-­‐  c  (levels  (Var),  “UNKNOWN”)

30. Initialise all required packages in one line: lapply(x,  require,   character.only=T), where x is char of all required package names

31. rev(x) reverses the elements of x

32. Use complete.cases() to get the rows which are complete (with no missing values)

33. avNNet() from nnet pkg to implement Model Averaged Neural Network

34. file.remove(‘filepath’) removes the file from directory. Use this wisely to delete multiple files esp in repetitive tasks.

35. Use ada() in ada pkg to implement Boosted classification trees.

36. Use unclass() on objects like ‘lm’ to break it down to a ‘list’. Makes it easier to access un-printed elements this way.

37. Sort a data frame based on 2 columns together: df[order(df$col1,  df $col2),    ]

38. Convert One ‘N-level factor var’ to N ‘binary-predictor-vars’ with model.matrix(~as.factor(Data)+0).

39. Use seasadj() to de-seasonalize a time series. http://goo.gl/Oio7s2

40. Use <<-­‐  instead of <-­‐ operator to assign the value to a variable that exists outside the function from which it is called.

41. Set the memory size R uses using memory.limit(size=desired-­‐size) in windows platform. On other platforms, use mem.limits()

42. Use file.copy(from=fromFile,  to  =  toFile,  overwrite=TRUE  ) to copy files with R , works even between connected servers.

43. Use debugonce() to run through debug step only once, instead of debug() which requires undebug() to come out of it.

44. Convert a R Factor Variable To A Collection of Multiple 1/0 Binary Vars: bins  <-­‐  model.matrix(  ~  0  +  varName,  data). Highly useful in regression modelling.

45. discretize() from arules pkg is a convenient function to convert continuous variables to categorical. It has convenient split criteria options.

46. NROW() is similar to nrow() function but even works on a vector, treating it as a 1-column matrix. You can safely use in place of length() function.

47. commandArgs() returns the cmd line arguments passed with R script run from command. http://bit.ly/1yARCWj

48. Use attr(myFunc,  “AttrName”)  <-­‐  myVal, within the function, it remembers the “AttrName” var in next call.

49. Use object.size() to estimate the memory a given R object consumes in bytes.

50. Use ls.str() (over ls()) to see structural details of objects when working on large R projects.

Follow Us on YouTube for More Updates

You might also like More from author