Data can only be turned into information by statistics, but big data calls for the data science!
Statistics and Probability are important parts of Data Science and you will have to be very efficient in it before you try your hands to analyse and visualize data.
To brush up on some basic statistics, without dropping a load of cash on a textbook/degree, try MOOC platforms like Udacity, EDx, Coursera. They offer good statistics courses that will be very helpful for you.
You need to know the following topics:
- Distributions theory: You should be good with numbers and have an idea of how to implement the data in different scenarios.
- Fitting: Once you know the distributions part, you have to fit it to data.
- Classical hypothesis testing
- Markov chains
- Basic Bayesian thinking & modelling
- Some old-school stats and probability theory
- Regression: Linear and Non-Linear Regression
- Machine learning Algorithms
In order to implement the theories and conclude with the output you need to be familiar with one programming language like R or Python. Which language to choose R Vs Python, refer this link,
Community Effort of StepUp Analytics
Refer the Statistics blogs and articles contributed on StepUp Analytics: Link
Statistics MOOC for Data Science
Take up the following courses to get started with Statistics.
|Introduction to Probability||edX|
|Introduction to Statistics: Descriptive Statistics||edX|
|Introduction to Statistics: Probability||edX|
|Introduction to Statistics: Inference||edX|
|Data Analysis & Statistics||edX|
|Intro to Descriptive Statistics||Udacity|
|Intro to Inferential Statistics||Udacity|
|Data Science Maths Skills||Coursera|
|Statistics.com – Data Analytics Courses||–|
|Stanford’s Machine Learning Course||Stanford|
You will be easily doing statistics once you are familiar with above concepts and its applications.
- Probabilistic Programming and Bayesian Methods for Hackers Github / Tutorials
- Probabilistic Graphical Models Stanford / Coursera
Statistics for Data Science free downloadable e-books
Refer the following books as they provide a strong approach to this concept with details and coding too. It will help you get a clearer idea of how to deal with statistics along with coding while with data science problems.
- Introduction to Statistical Learning (R focus): Page on usc.edu
- Elements of Statistical Learning (R focus): data mining, inference, and prediction. 2nd Edition.
- Think Stats (Python focus): Probability and Statistics for Programmers
That being said, I recommend using no single resource. Statistics is far too important to Data Science. You must master it, and like most things, that is a constant work in progress.
Hope this helps! Happy learning