-
内容大纲
机器学习系统既复杂又独特。复杂是因为包含大量组件,涉及许多不同的利益方;独特是因为其依赖于数据,不同用例之间的数据差异很大。在本书中,你将学习以一种整体方法来设计兼具可靠性、可伸缩性、可维护性,并能适应不断变化的环境和业务需求的机器学习系统。
作者Chip Huyen是CIaypot AI的联合创始人,她在如何帮助系统作为一个整体实现其目标的背景下考虑了每一种设计决策,例如如何处理和创建训练数据,使用哪些特性,重新训练模型的频率,以及监测哪些内容。书中的迭代框架采用了真实的案例研究,并辅以大量参考资料。
这本书将帮助你处理以下情况:
工程化数据并选择正确的指标来解决业务问题
实现持续开发、评估、部署和更新模型的流程自动化
开发监控系统,快速检测和解决模型在生产中可能遇到的问题
构建跨用例服务的机器学习平台
开发可靠的机器学习系统 -
作者介绍
奇普·胡岩是实时机器学习平台Claypot Al的联合创始人。凭借在NVIDIA、Netflix和Snorkel AI的工作,她帮助了一些世界上最大的组织开发和部署机器学习系统。 -
目录
Preface
1. Overview of Machine Learning Systems
When to Use Machine Learning
Machine Learning Use Cases
Understanding Machine Learning Systems
Machine Learning in Research Versus in Production
Machine Learning Systems Versus Traditional Software
Summary
2. Introduction to Machine Learning Systems Design
Business and ML Objectives
Requirements for ML Systems
Reliability
Scalability
Maintainability
Adaptability
Iterative Process
Framing ML Problems
Types of ML Tasks
Objective Functions
Mind Versus Data
Summary
3. Data Engineering Fundamentals
Data Sources
Data Formats
ISON
Row-Major Versus Column-Major Format
Text Versus Binary Format
Data Models
Relational Model
NoSQL
Structured Versus Unstructured Data
Data Storage Engines and Processing
Transactional and Analytical Processing
ETL: Extract, Transform, and Load
Modes of Dataflow
Data Passing Through Databases
Data Passing Through Services
Data Passing Through Real-Time Transport
Batch Processing Versus Stream Processing
Summary
4. Training Data
Sampling
Nonprobability Sampling
Simple Random Sampling
Stratified Sampling
Weighted Sampling
Reservoir Sampling
Importance Sampling
Labeling
Hand Labels
Natural Labels
Handling the Lack of Labels
Class Imbalance
Challenges of Class Imbalance
Handling Class Imbalance
Data Augmentation
Simple Label-Preserving Transformations
Perturbation
Data Synthesis
Summary
5. Feature Engineering
Learned Features Versus Engineered Features
Common Feature Engineering Operations
Handling Missing Values
Scaling
Discretization
Encoding Categorical Features
Feature Crossing
Discrete and Continuous Positional Embeddings
Data Leakage
Common Causes for Data Leakage
Detecting Data Leakage
Engineering Good Features
Feature Importance
Feature Generalization
Summary
6. Model Development and 0ffline Evaluation
Model Development and Training
Evaluating ML Models
Ensembles
Experiment Tracking and Versioning
Distributed Training
AutoML
Model Offline Evaluation
Baselines
Evaluation Methods
Summary
7. Model Deployment and Prediction Service
Machine Learning Deployment Myths
Myth 1: You Only Deploy One or Two ML Models at a Time
Myth 2: If We Don't Do Anything, Model Performance Remains the Same
Myth 3: You Won't Need to Update Your Models as Much
Myth 4: Most ML Engineers Don't Need to Worry About Scale
Batch Prediction Versus Online Prediction
From Batch Prediction to Online Prediction
Unifying Batch Pipeline and Streaming Pipeline
Model Compression
Low-Rank Factorization
Knowledge Distillation
Pruning
Quantization
ML on the Cloud and on the Edge
Compiling and Optimizing Models for Edge Devices
ML in Browsers
Summary
8. Data Distribution Shifts and Monitoring
Causes of ML System Failures
Software System Failures
ML-Specific Failures
Data Distribution Shifts
Types of Data Distribution Shifts
General Data Distribution Shifts
Detecting Data Distribution Shifts
Addressing Data Distribution Shifts
Monitoring and Observability
ML-Specific Metrics
Monitoring Toolbox
Observability
Summary
9. Continual Learning and Test in Production
Continual Learning
Stateless Retraining Versus Stateful Training
Why Continual Learning?
Continual Learning Challenges
Four Stages of Continual Learning
How Often to Update Your Models
Test in Production
Shadow Deployment
A/B Testing
Canary Release
Interleaving Experiments
Bandits
Summary
10. Infrastructure and Tooling for MLOps
Storage and Compute
Public Cloud Versus Private Data Centers
Development Environment
Dev Environment Setup
Standardizing Dev Environments
From Dev to Prod: Containers
Resource Management
Cron, Schedulers, and Orchestrators
Data Science Workflow Management
ML Platform
Model Deployment
Model Store
Feature Store
Build Versus Buy
Summary
11. The Human Side of Machine Learning
User Experience
Ensuring User Experience Consistency
Combatting "Mostly Correct" Predictions
Smooth Failing
Team Structure
Cross-functional Teams Collaboration
End-to-End Data Scientists
Responsible AI
Irresponsible AI: Case Studies
A Framework for Responsible AI
Summary
Epilogue
Index
同类热销排行榜
- C语言与程序设计教程(高等学校计算机类十二五规划教材)16
- 电机与拖动基础(教育部高等学校自动化专业教学指导分委员会规划工程应用型自动化专业系列教材)13.48
- 传感器与检测技术(第2版高职高专电子信息类系列教材)13.6
- ASP.NET项目开发实战(高职高专计算机项目任务驱动模式教材)15.2
- Access数据库实用教程(第2版十二五职业教育国家规划教材)14.72
- 信号与系统(第3版下普通高等教育九五国家级重点教材)15.08
- 电气控制与PLC(普通高等教育十二五电气信息类规划教材)17.2
- 数字电子技术基础(第2版)17.36
- VB程序设计及应用(第3版十二五职业教育国家规划教材)14.32
- Java Web从入门到精通(附光盘)/软件开发视频大讲堂27.92
推荐书目
-
孩子你慢慢来/人生三书 华人世界率性犀利的一枝笔,龙应台独家授权《孩子你慢慢来》20周年经典新版。她的《...
-
时间简史(插图版) 相对论、黑洞、弯曲空间……这些词给我们的感觉是艰深、晦涩、难以理解而且与我们的...
-
本质(精) 改革开放40年,恰如一部四部曲的年代大戏。技术突变、产品迭代、产业升级、资本对接...