Statistics for Data Science

Data can only be turned into information by statistics, but big data calls for the data science!

Statistics and Probability are important parts of Data Science and you will have to be very efficient in it before you try your hands to analyse and visualize data.

To brush up on some basic statistics, without dropping a load of cash on a textbook/degree, try MOOC platforms like Udacity, EDx, Coursera. They offer good statistics courses that will be very helpful for you.

You need to know the following topics:

  • Distributions theory: You should be good with numbers and have an idea of how to implement the data in different scenarios.
  • Fitting: Once you know the distributions part, you have to fit it to data.
  • Classical hypothesis testing
  • Markov chains
  • Basic Bayesian thinking & modelling
  • Some old-school stats and probability theory
  • Regression: Linear and Non-Linear Regression
  • Machine learning Algorithms

In order to implement the theories and conclude with the output you need to be familiar with one programming language like R or Python. Which language to choose R Vs Python, refer this link,

Community Effort of StepUp Analytics
Refer the Statistics blogs and articles contributed on StepUp Analytics: Link

Statistics MOOC for Data Science
Take up the following courses to get started with Statistics.

Courses Platform
Introduction to Probability edX
Statistical Reasoning Stanford
Introduction to Statistics: Descriptive Statistics edX
Introduction to Statistics: Probability edX
Introduction to Statistics: Inference edX
Data Analysis & Statistics edX
Intro to Descriptive Statistics Udacity
Intro to Inferential Statistics Udacity
Data Science Maths Skills Coursera – Data Analytics Courses
Statistical Learning Stanford
Stanford’s Machine Learning Course Stanford


You will be easily doing statistics once you are familiar with above concepts and its applications.

Probabilistic Modeling

Statistics for Data Science free downloadable e-books
Refer the following books as they provide a strong approach to this concept with details and coding too. It will help you get a clearer idea of how to deal with statistics along with coding while with data science problems.

That being said, I recommend using no single resource.  Statistics is far too important to Data Science. You must master it, and like most things, that is a constant work in progress.

Hope this helps! Happy learning

You might also like More from author