Learning Path: R: Master Data Mining Techniques with R
 Description
 Curriculum
 FAQ
 Reviews
The world is emitting data at a very high pace and everyone wants to gain insights from the huge number of data coming their way. Data mining provides a way of finding these insights and R has become the gototool for it among the data analysts and data scientists. If you’re looking forward to working on complex data mining projects and gaining deeper insights of data, then go for this Learning Path.
Packtās Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.
The highlights of this Learning Path are:
 Practical projects on realworld data mining use cases presented in a very easytounderstand manner
 Onestop solution to perform spatial data mining, text mining, social media mining, and web mining
Letās get on this data mining journey together! This Learning Path starts with a brief introduction to R and setting up the development environment. Get a firm hold on the fundamentals of R and gradually build your skill level for data science. This Learning Path will then teach you various data mining techniques, showing you how to apply different mining concepts to various statistical and data applications in a wide range of fields. It will help you complete complex data mining cases and guide you through handling issues you might encounter during projects. Moving ahead, you will build your own recommendation engine. You will then implement dimensionality reduction and use it to build a realworld project. You will be also introduced to the concept of neural networks and learn how to apply them for predictions, classifications, and forecasting. Finally, you will implement ggplot2, plotly and aspects of geomapping to create your own data visualization projects.
After completing this Learning Path, you will have a solid understanding of all data mining techniques and how to implement them using R, in any realworld scenario.
About the Author:
We have combined the best works of the following esteemed authors to ensure that your learning journey is smooth:
Dr. Samik Sen is a theoretical physicist and loves thinking about hard problems. After his PH.D. in developing computational methods to solve problems for which no solutions existed, he began thinking about how to tackle math problems while lecturing. He developed algorithms to generate problem sets and solutions and learned how to create video lessons. He has since developed a large Facebook community teaching school math around Ireland, with associated elearning products and a YouTube channel. He has a YouTube channel associated with data science, which also provides a valuable engagement with people round the world who look at problems from a different perspective.
Pradeepta Mishra is a data scientist, predictive modeling expert, deep learning and machine learning practitioner, and an econometrician. He is currently leading the data science and machine learning practice for Ma Foi Analytics, Bangalore, India. He holds a patent for enhancing planogram design for the retail industry. Pradeepta has published and presented research papers at IIM Ahmedabad, India. He is a visiting faculty at various leading Bschools and regularly gives talks on data science and machine learning. Pradeepta has spent more than 10 years in his domain and has solved various projects relating to classification, regression, pattern recognition, time series forecasting, and unstructured data analysis using text mining procedures, spanning across domains such as healthcare, insurance, retail and ecommerce, manufacturing, and so on.

1The Course Overview
This video gives an overview of the entire course.

2What Is R?
The aim of the video is to introduce the section and overview of the language R.

3Getting and Setting Up R/Rstudio
We need to have the core programs before we can begin and in this video,we show where to get them.

4Using RStudio
In this video, we look at where to begin, so that we can get started.

5Packages
In this video, you will learn how RStudio has packages which avoid the problems and how we'll work on them.

6A Lot Is the Same
In this video, you will learn how similar R is to other languages.

7Familiar Building Programming Blocks
In this video, we see more familiar things in R.

8Putting It All Together
In this video, we are now ready to write programs.

9Core R Types
In this video, we will look at R data types which are new.

10Some Useful Operations
In this video, we will introduce some key commands to study data.

11More Useful Operations
In this video, we will introduce various commands to help us pick out elements in which we are interested in.

12Titanic
In this video, we will investigate the Titanic dataset to see what it says.

13Tennis
In this video, we willadd a value by processing our data.

14It's Mostly Cleaning Up
In this video,we will download football results from a web page.

15The Most Widely Used Statistical Package
In this video, we will use R to do some statistics.

16Distributions
In this video, we will work with distributions using R.

17Time to Get Graphical
In this video, we will see some of R's graphical power.

18Plotting to Another Dimension
In this video, we will use the plotting package, ggplot2.

19Facets
In this video, we will see another plotting technique known as Facets.

20Test Your Knowledge

21The Course Overview
This video provides an overview of the entire course.

22What Is Data Mining?
The process of deciphering meaningful insights from existing databases and analyzing results for consumption by business users.

23Introduction to the R Programming Language
We are going to start with basic programming using R for data management and data manipulation.

24Data Type Conversion
Changing one data type to another if the formatting is not done properly is not difficult at all using R.

25Sorting, Merging, Indexing, and Subsetting Dataframes
While working on a client dataset with a large number of observations, it is required to subset the data based on some selection criteria and with or without replacementbased sampling.

26Date and Time Formatting
The date functions return a Date class that represents the number of days since January 1, 1970.

27Types of Functions
There are two different types of functions in R, userdefined functions and builtin Functions.

28Loop Concepts
Using a loop, a similar task can be performed many times.

29Applying Concepts
The apply function uses an array, a matrix, or a dataframe as an input and returns the result in an array format.

30String Manipulation
In typical data management, it is important to standardize the text columns or variables in a dataset because R is case sensitive and it reads any discrepancy as a new data point.

31NA and Missing Value Management and Imputation Techniques
The R programming language, missing values are represented as NA. NAs are not string or numeric values; they are considered as an indicator for missing values.

32Univariate Data Analysis
To generate univariate statistics about a dataset, we have to follow two approaches, one for continuous variables and the other for discrete or categorical variables.

33Bivariate Analysis
The relationship or association between two variables is known as bivariate analysis. There are three possible ways of looking at the relationship.

34Multivariate Analysis
The multivariate relationship is a statistical way of looking at multiple dependent and independent variables and their relationships.

35Understanding Distributions and Transformation
Understanding probability distributions is important in order to have a clear idea about the assumptions of any statistical hypothesis test.

36Interpreting Distributions and Variable Binning
Interpretation of the calculated distribution helps in forming a hypothesis.

37Contingency Tables, Bivariate Statistics, and Checking for Data Normality
Contingency tables are frequency tables represented by two or more categorical variables Frequency table is used to represent one categorical variable; however, contingency table is used to represent two categorical variables.

38Hypothesis Testing
The null hypothesis states that nothing has happened; the means are constant, and so on. However, the alternative hypothesis states that something different has happened and the means are different about a population.

39NonParametric Methods
When a training dataset does not conform to any specific probability distribution because of nonadherence to the assumptions of that specific probability distribution, the only option left to analyze the data is via nonparametric methods.

40Introduction to Data Visualization
This video will walk you through the basics of data visualization along with how to create advanced data visualization using existing libraries in R programming language.

41Visualizing Charts, and Geo Mapping
This video will let you explore different kinds of charts and plots and their creation. You'll also be able to use geo mapping.

42Visualizing Scatterplot, Word Cloud and More
By the end of this video, you will be able to use some amazing data visualization techniques which are widely used for smart Data representation.

43Using plotly
This video will teach you how to take the plotting to a new level. Here, you will learn to use the plotly library, which is designed as an interactive browserbased charting library built on the JavaScript library.

44Creating Geo Mapping
This video will let you explore the Geo mapping which is a type of chart, used by data mining experts when the dataset contains location information.

45Introduction about Regression
How could you predict the future outcomes of a target variable? Regression is the answer to this. Let's have a brief introduction and understand regression.

46Linear Regression
This video will let you explore about Linear regression model which can be used for explaining the relationship between a single dependent variable and independent variable.

47Stepwise Regression Method for Variable Selection
This video will let you understand the use of stepwise regression method to solve complex regression problems.

48Logistic Regression
What could we do in those scenarios where the variable of interest is categorical in nature, such as buying a product or not, approving a credit card or not, tumor is cancerous or not, and so on? Logistic regression is the best solution to these.

49Cubic Regression
Let's dive into another form of regression where the parameters in a linear regression model are increased up to one or two levels of polynomial calculation.

50Introduction to Market Basket Analysis
Market Basket Analysis is the study of relationships between various products and products that are purchased together or in a series of transactions.

51Practical project
Implementing market basket analysis.

52Test Your Knowledge
Social Network