Data Mining and Predictive Analysis

Data mining is the process of discovering useful patterns and trends in large data sets and predictive analytics is the process of extracting information from large data sets in order to make predictions and estimates about future outcomes. Data mining is becoming more widespread every day, because it empowers companies to uncover profitable patterns and trends from their existing databases. With uCertify’s course Data mining and predictive analysis, you get a hands-on experience in data mining and you will learn what types of analysis will uncover the most profitable nuggets of knowledge from the data, while avoiding the potential pitfalls that may cost your company millions of dollars.

Name	Buy
Data Mining and Predictive Analysis	Data Mining and Predictive Analysis quantity

Test Prep

Features

63+ LiveLab | 63+ Video tutorials | 02:02+ Hours

Why choose TOPTALENT?

Get assistance every step of the way from our Texas-based team, ensuring your training experience is hassle-free and aligned with your goals.
Access an expansive range of over 3,000 training courses with a strong focus on Information Technology, Business Applications, and Leadership Development.
Have confidence in an exceptional 95% approval rating from our students, reflecting outstanding satisfaction with our course content, program support, and overall customer service.
Benefit from being taught by Professionally Certified Instructors with expertise in their fields and a strong commitment to making sure you learn and succeed.

Outline

Lessons 1:
Preface

What is Data Mining? What is Predictive Analytics?
Why is this Course Needed?
Who Will Benefit from this Course?
Danger! Data Mining is Easy to do Badly
“White-Box” Approach
Algorithm Walk-Throughs
Exciting New Topics
The R Zone
Appendix: Data Summarization and Visualization
The Case Study: Bringing it all Together
How the Course is Structured

Lessons 2:
An Introduction to Data Mining and Predictive Analytics

What is Data Mining? What Is Predictive Analytics?
Wanted: Data Miners
The Need For Human Direction of Data Mining
The Cross-Industry Standard Process for Data Mining: CRISP-DM
Fallacies of Data Mining
What Tasks can Data Mining Accomplish
The R Zone
R References
Exercises

Lessons 3:
Data Preprocessing

Why do We Need to Preprocess the Data?
Data Cleaning
Handling Missing Data
Identifying Misclassifications
Graphical Methods for Identifying Outliers
Measures of Center and Spread
Data Transformation
Min–Max Normalization
Z-Score Standardization
Decimal Scaling
Transformations to Achieve Normality
Numerical Methods for Identifying Outliers
Flag Variables
Transforming Categorical Variables into Numerical Variables
Binning Numerical Variables
Reclassifying Categorical Variables
Adding an Index Field
Removing Variables that are not Useful
Variables that Should Probably not be Removed
Removal of Duplicate Records
A Word About ID Fields
The R Zone
R Reference
Exercises

Lessons 4:
Exploratory Data Analysis

Hypothesis Testing Versus Exploratory Data Analysis
Getting to Know The Data Set
Exploring Categorical Variables
Exploring Numeric Variables
Exploring Multivariate Relationships
Selecting Interesting Subsets of the Data for Further Investigation
Using EDA to Uncover Anomalous Fields
Binning Based on Predictive Value
Deriving New Variables: Flag Variables
Deriving New Variables: Numerical Variables
Using EDA to Investigate Correlated Predictor Variables
Summary of Our EDA
The R Zone
R References
Exercises

Lessons 5:
Dimension-Reduction Methods

Need for Dimension-Reduction in Data Mining
Principal Components Analysis
Applying PCA to the Houses Data Set
How Many Components Should We Extract?
Profiling the Principal Components
Communalities
Validation of the Principal Components
Factor Analysis
Applying Factor Analysis to the Adult Data Set
Factor Rotation
User-Defined Composites
An Example of a User-Defined Composite
The R Zone
R References
Exercises

Lessons 6:
Univariate Statistical Analysis

Data Mining Tasks in Discovering Knowledge in Data
Statistical Approaches to Estimation and Prediction
Statistical Inference
How Confident are We in Our Estimates?
Confidence Interval Estimation of the Mean
How to Reduce the Margin of Error
Confidence Interval Estimation of the Proportion
Hypothesis Testing for the Mean
Assessing The Strength of Evidence Against The Null Hypothesis
Using Confidence Intervals to Perform Hypothesis Tests
Hypothesis Testing for The Proportion
Reference
The R Zone
R Reference
Exercises

Lessons 7:
Multivariate Statistics

Two-Sample t-Test for Difference in Means
Two-Sample Z-Test for Difference in Proportions
Test for the Homogeneity of Proportions
Chi-Square Test for Goodness of Fit of Multinomial Data
Analysis of Variance
Reference
The R Zone
R Reference
Exercises

Lessons 8:
Preparing to Model the Data

Supervised Versus Unsupervised Methods
Statistical Methodology and Data Mining Methodology
Cross-Validation
Overfitting
Bias–Variance Trade-Off
Balancing The Training Data Set
Establishing Baseline Performance
The R Zone
R Reference
Exercises

Lessons 9:
Simple Linear Regression

An Example of Simple Linear Regression
Dangers of Extrapolation
How Useful is the Regression? The Coefficient of Determination, r2
Standard Error of the Estimate, s
Correlation Coefficient r
Anova Table for Simple Linear Regression
Outliers, High Leverage Points, and Influential Observations
Population Regression Equation
Verifying The Regression Assumptions
Inference in Regression
t-Test for the Relationship Between x and y
Confidence Interval for the Slope of the Regression Line
Confidence Interval for the Correlation Coefficient ρ
Confidence Interval for the Mean Value of y Given x
Prediction Interval for a Randomly Chosen Value of y Given x
Transformations to Achieve Linearity
Box–Cox Transformations
The R Zone
R References
Exercises

Lessons 10:
Multiple Regression and Model Building

An Example of Multiple Regression
The Population Multiple Regression Equation
Inference in Multiple Regression
Regression With Categorical Predictors, Using Indicator Variables
Adjusting R2: Penalizing Models For Including Predictors That Are Not Useful
Sequential Sums of Squares
Multicollinearity
Variable Selection Methods
Gas Mileage Data Set
An Application of Variable Selection Methods
Using the Principal Components as Predictors in Multiple Regression
The R Zone
R References
Exercises

Lessons 11:
k-Nearest Neighbor Algorithm

Classification Task
k-Nearest Neighbor Algorithm
Distance Function
Combination Function
Quantifying Attribute Relevance: Stretching the Axes
Database Considerations
k-Nearest Neighbor Algorithm for Estimation and Prediction
Choosing k
Application of k-Nearest Neighbor Algorithm Using IBM/SPSS Modeler
The R Zone
R References
Exercises

Lessons 12:
Decision Trees

What is a Decision Tree?
Requirements for Using Decision Trees
Classification and Regression Trees
C4.5 Algorithm
Decision Rules
Comparison of the C5.0 and CART Algorithms Applied to Real Data
The R Zone
R References
Exercises

Lessons 13:
Neural Networks

Input and Output Encoding
Neural Networks for Estimation and Prediction
Simple Example of a Neural Network
Sigmoid Activation Function
Back-Propagation
Gradient-Descent Method
Back-Propagation Rules
Example of Back-Propagation
Termination Criteria
Learning Rate
Momentum Term
Sensitivity Analysis
Application of Neural Network Modeling
The R Zone
R References
Exercises

Lessons 14:
Logistic Regression

Simple Example of Logistic Regression
Maximum Likelihood Estimation
Interpreting Logistic Regression Output
Inference: Are the Predictors Significant?
Odds Ratio and Relative Risk
Interpreting Logistic Regression for a Dichotomous Predictor
Interpreting Logistic Regression for a Polychotomous Predictor
Interpreting Logistic Regression for a Continuous Predictor
Assumption of Linearity
Zero-Cell Problem
Multiple Logistic Regression
Introducing Higher Order Terms to Handle Nonlinearity
Validating the Logistic Regression Model
WEKA: Hands-On Analysis Using Logistic Regression
The R Zone
R References
Exercises

Lessons 15:
NaïVe Bayes and Bayesian Networks

Bayesian Approach
Maximum A Posteriori (MAP) Classification
Posterior Odds Ratio
Balancing The Data
Naïve Bayes Classification
Interpreting The Log Posterior Odds Ratio
Zero-Cell Problem
Numeric Predictors for Naïve Bayes Classification
WEKA: Hands-on Analysis Using Naïve Bayes
Bayesian Belief Networks
Clothing Purchase Example
Using The Bayesian Network to Find Probabilities
The R Zone
R References
Exercises

Lessons 16:
Model Evaluation Techniques

Model Evaluation Techniques for the Description Task
Model Evaluation Techniques for the Estimation and Prediction Tasks
Model Evaluation Measures for the Classification Task
Accuracy and Overall Error Rate
Sensitivity and Specificity
False-Positive Rate and False-Negative Rate
Proportions of True Positives, True Negatives, False Positives, and False Negatives
Misclassification Cost Adjustment to Reflect Real-World Concerns
Decision Cost/Benefit Analysis
Lift Charts and Gains Charts
Interweaving Model Evaluation with Model Building
Confluence of Results: Applying a Suite of Models

Data Mining and Predictive Analysis

Test Prep

Features

Why choose TOPTALENT?

Outline

Lessons 1: Preface

Lessons 2: An Introduction to Data Mining and Predictive Analytics

Lessons 3: Data Preprocessing

Lessons 4: Exploratory Data Analysis

Lessons 5: Dimension-Reduction Methods

Lessons 6: Univariate Statistical Analysis

Lessons 7: Multivariate Statistics

Lessons 8: Preparing to Model the Data

Lessons 9: Simple Linear Regression

Lessons 10: Multiple Regression and Model Building

Lessons 11: k-Nearest Neighbor Algorithm

Lessons 12: Decision Trees

Lessons 13: Neural Networks

Lessons 14: Logistic Regression

Lessons 15: NaïVe Bayes and Bayesian Networks

Lessons 16: Model Evaluation Techniques