-
内容大纲
本书是一部关于数据分析的经典教材,聚焦预测建模的实际应用,如如何进行数据预处理、模型调优、预测变量重要性度量、变量选择等。读者可以从中学到许多建模方法以及提高对许多常用的、现代的有效模型的认识,如线性回归、非线性回归和分类模型,涉及树方法、支持向量机等。书中还涉及从数据预处理到建模再到模型评估和选择的整个过程,以及背后的统计思想,涉及各种回归技术和分类技术。 -
作者介绍
-
目录
1 Introduction
1.1 Prediction Versus Interpretation
1.2 Key Ingredients of Predictive Models
1.3 Terminology
1.4 Example Data Sets and Typical Data Scenarios
1.5 Overview
1.6 Notation
Part Ⅰ General Strategies
2 A Short Tour of the Predictive Modeling Process
2.1 Case Study: Predicting Fuel Economy
2.2 Themes
2.3 Summary
3 Data Pre-processing
3.1 Case Study: Cell Segmentation in High-Content Screening
3.2 Data Transformations for Individual Predictors
3.3 Data Transformations for Multiple Predictors
3.4 Dealing with Missing Values
3.5 Removing Predictors
3.6 Adding Predictors
3.7 Binning Predictors
3.8 Computing
Exercises
4 Over-Fitting and Model Tuning
4.1 The Problem of Over-Fitting
4.2 Model Tuning
4.3 Data Splitting
4.4 Resampling Techniques
4.5 Case Study: Credit Scoring
4.6 Choosing Final Tuning Parameters
4.7 Data Splitting Recommendations
4.8 Choosing Between Models
4.9 Computing
Exercises
Part Ⅱ Regression Models
5 Measuring Performance in Regression Models
5.1 Quantitative Measures of Performance
5.2 The Variance-Bias Trade-off
5.3 Computing
6 Linear Regression and Its Cousins
6.1 Case Study: Quantitative Structure-Activity Relationshir Modeling
6.2 Linear Regression
6.3 Partial Least Squares
6.4 Penalized Models
6.5 Computing
Exercises
7 Nonlinear Regression Models
7.1 Neural Networks
7.2 Multivariate Adaptive Regression Splines
7.3 Support Vector Machines
7.4 K-Nearest Neighbors
7.5 Computing
Exercises
8 Regression Trees and Rule-Based Models
8.1 Basic Regression Trees
8.2 Regression Model Trees
8.3 Rule-Based Models
8.4 Bagged Trees
8.5 Random Forests
8.6 Boosting
8.7 Cubist
8.8 Computing
Exercises
9 A Summary of Solubility Models
10 Case Study: Compressive Strength of Concrete Mixtures
10.1 Model Building Strategy
10.2 Model Performance
10.3 Optimizing Compressive Strength
10.4 Computing
Part Ⅲ Classification Models
11 Measuring Performance in Classification Models
11.1 Class Predictions
11.2 Evaluating Predicted Classes
11.3 Evaluating Class Probabilities
11.4 Computing
12 Discriminant Analysis and Other Linear Classification Models
12.1 Case Study: Predicting Successful Grant Applications
12.2 Logistic Regression
12.3 Linear Discriminant Analysis
12.4 Partial Least Squares Discriminant Analysis
12.5 Penalized Models
12.6 Nearest Shrunken Centroids
12.7 Computing
Exercises
13 Nonlinear Classification Models
13.1 Nonlinear Discriminant Analysis
13.2 Neural Networks
13.3 Flexible Discriminant Analysis
13.4 Support Vector Machines
13.5 K-Nearest Neighbors
13.6 Naive Bayes
13.7 Computing
Exercises
14 Classification Trees and Rule-Based Models
14.1 Basic Classification Trees
14.2 Rule-Based Models
14.3 Bagged Trees
14.4 Random Forests
14.5 Boosting
14.6 C5.0
14.7 Comparing Two Encodings of Categorical Predictors
14.8 Computing
Exercises
15 A Summary of Grant Application Models
16 Remedies for Severe Class Imbalance
16.1 Case Study: Predicting Caravan Policy Ownership
16.2 The Effect of Class Imbalance
16.3 Model Tuning
16.4 Alternate Cutoffs
16.5 Adjusting Prior Probabilities
16.6 Unequal Case Weights
16.7 Sampling Methods
16.8 Cost-Sensitive Training
16.9 Computing
Exercises
17 Case Study: Job Scheduling
17.1 Data Splitting and Model Strategy
17.2 Results
17.3 Computing
Part Ⅳ Other Considerations
18 Measuring Predictor Importance
18.1 Numeric Outcomes
18.2 Categorical Outcomes
18.3 Other Approaches
18.4 Computing
Exercises
19 An Introduction to Feature Selection
19.11 Consequences of Using Non-informative Predictors
19.12 Approaches for Reducing the Number of Predictor
19.13 Wrapper Methods
19.14 Filter Methods
19.15 Selection Bias
19.16 Case Study: Predicting Cognitive Impairment
19.17 Computing
Exercises
20 Factors That Can Affect Model Performance
20.1 Type Ⅲ Errors
20.2 Measurement Error in the Outcome
20.3 Measurement Error in the Predictors
20.4 Discretizing Continuous Outcomes
20.5 When Should You Trust Your Model's Prediction?
20.6 The Impact of a Large Sample
20.7 Computing
Exercises
Appendix
A A Summary of Various Models
B An Introduction to R
B.1 Start-Up and Getting Help
B.2 Packages
B.3 Creating Objects
B.4 Data Types and Basic Structures
B.5 Working with Rectangular Data Sets
B.6 Objects and Classes
B.7 R Functions
B.8 The Three Faces of =
B.9 The AppliedPredictiveModeling Package
B.10 The caret Package
B.11 Software Used in this Text
C Interesting Web Sites
References
Indicies
Computing
General
同类热销排行榜
- 目送/人生三书
- 21世纪的《背影》 + 感人至深的“生死笔记”+ 龙应台亲手摄影 + 跨三代共读的人生之书! 华人世界率性犀利的一枝笔,龙应台独家...
- 顾城的诗(金版)(精)/蓝星诗库
- 人类群星闪耀时(插图本)/译林名著精选
- 牛津高阶英汉双解词典(附光盘第8版)(精)
- 文化苦旅(新版)
- 摆渡人
- 解忧杂货店(精)
- 骆驼祥子
- 曾国藩(又笨又慢平天下)
- 查令十字街84号(珍藏版)(精)
推荐书目
-
孩子你慢慢来/人生三书 华人世界率性犀利的一枝笔,龙应台独家授权《孩子你慢慢来》20周年经典新版。她的《...
-
时间简史(插图版) 相对论、黑洞、弯曲空间……这些词给我们的感觉是艰深、晦涩、难以理解而且与我们的...
-
本质(精) 改革开放40年,恰如一部四部曲的年代大戏。技术突变、产品迭代、产业升级、资本对接...