- 设计机器学习系统(影印版)(英文版)
- - 作者：(越)奇普·胡岩|责编:张烨
  - 出版社：东南大学
  - ISBN：9787576602241
  - 出版日期：2022/10/01
  - 页数：367
- 售价：55.2

内容大纲
    机器学习系统既复杂又独特。复杂是因为包含大量组件，涉及许多不同的利益方；独特是因为其依赖于数据，不同用例之间的数据差异很大。在本书中，你将学习以一种整体方法来设计兼具可靠性、可伸缩性、可维护性，并能适应不断变化的环境和业务需求的机器学习系统。
    作者Chip Huyen是CIaypot AI的联合创始人，她在如何帮助系统作为一个整体实现其目标的背景下考虑了每一种设计决策，例如如何处理和创建训练数据，使用哪些特性，重新训练模型的频率，以及监测哪些内容。书中的迭代框架采用了真实的案例研究，并辅以大量参考资料。
    这本书将帮助你处理以下情况：
    工程化数据并选择正确的指标来解决业务问题
    实现持续开发、评估、部署和更新模型的流程自动化
    开发监控系统，快速检测和解决模型在生产中可能遇到的问题
    构建跨用例服务的机器学习平台
    开发可靠的机器学习系统
作者介绍
奇普·胡岩是实时机器学习平台Claypot Al的联合创始人。凭借在NVIDIA、Netflix和Snorkel AI的工作，她帮助了一些世界上最大的组织开发和部署机器学习系统。
目录
Preface
1. Overview of Machine Learning Systems
   When to Use Machine Learning
  Machine Learning Use Cases
   Understanding Machine Learning Systems
  Machine Learning in Research Versus in Production
  Machine Learning Systems Versus Traditional Software
   Summary
2. Introduction to Machine Learning Systems Design
   Business and ML Objectives
   Requirements for ML Systems
  Reliability
  Scalability
  Maintainability
  Adaptability
   Iterative Process
   Framing ML Problems
  Types of ML Tasks
  Objective Functions
   Mind Versus Data
   Summary
3. Data Engineering Fundamentals
   Data Sources
   Data Formats
  ISON
  Row-Major Versus Column-Major Format
  Text Versus Binary Format
   Data Models
  Relational Model
  NoSQL
  Structured Versus Unstructured Data
   Data Storage Engines and Processing
  Transactional and Analytical Processing
  ETL: Extract, Transform, and Load
   Modes of Dataflow
  Data Passing Through Databases
  Data Passing Through Services
  Data Passing Through Real-Time Transport
   Batch Processing Versus Stream Processing
   Summary
4. Training Data
   Sampling
  Nonprobability Sampling
  Simple Random Sampling
  Stratified Sampling
  Weighted Sampling
  Reservoir Sampling
  Importance Sampling
   Labeling
  Hand Labels

  Natural Labels
  Handling the Lack of Labels
   Class Imbalance
  Challenges of Class Imbalance
  Handling Class Imbalance
   Data Augmentation
  Simple Label-Preserving Transformations
  Perturbation
  Data Synthesis
   Summary
5. Feature Engineering
   Learned Features Versus Engineered Features
   Common Feature Engineering Operations
  Handling Missing Values
  Scaling
   Discretization
   Encoding Categorical Features
   Feature Crossing
   Discrete and Continuous Positional Embeddings
   Data Leakage
   Common Causes for Data Leakage
   Detecting Data Leakage
   Engineering Good Features
   Feature Importance
   Feature Generalization
   Summary
6. Model Development and 0ffline Evaluation
   Model Development and Training
   Evaluating ML Models
   Ensembles
   Experiment Tracking and Versioning
   Distributed Training
   AutoML
   Model Offline Evaluation
   Baselines
   Evaluation Methods
   Summary
7. Model Deployment and Prediction Service
   Machine Learning Deployment Myths
   Myth 1: You Only Deploy One or Two ML Models at a Time
   Myth 2: If We Don't Do Anything, Model Performance Remains the Same
   Myth 3: You Won't Need to Update Your Models as Much
   Myth 4: Most ML Engineers Don't Need to Worry About Scale
   Batch Prediction Versus Online Prediction
   From Batch Prediction to Online Prediction
   Unifying Batch Pipeline and Streaming Pipeline
   Model Compression
   Low-Rank Factorization
   Knowledge Distillation
   Pruning

   Quantization
   ML on the Cloud and on the Edge
   Compiling and Optimizing Models for Edge Devices
   ML in Browsers
    Summary
8. Data Distribution Shifts and Monitoring
  Causes of ML System Failures
    Software System Failures
    ML-Specific Failures
  Data Distribution Shifts
    Types of Data Distribution Shifts
    General Data Distribution Shifts
    Detecting Data Distribution Shifts
    Addressing Data Distribution Shifts
  Monitoring and Observability
    ML-Specific Metrics
    Monitoring Toolbox
    Observability
  Summary
9. Continual Learning and Test in Production
  Continual Learning
    Stateless Retraining Versus Stateful Training
    Why Continual Learning?
    Continual Learning Challenges
    Four Stages of Continual Learning
    How Often to Update Your Models
    Test in Production
    Shadow Deployment
    A/B Testing
    Canary Release
    Interleaving Experiments
    Bandits
  Summary
10. Infrastructure and Tooling for MLOps
  Storage and Compute
    Public Cloud Versus Private Data Centers
  Development Environment
    Dev Environment Setup
    Standardizing Dev Environments
    From Dev to Prod: Containers
  Resource Management
    Cron, Schedulers, and Orchestrators
    Data Science Workflow Management
    ML Platform
    Model Deployment
   Model Store
   Feature Store
    Build Versus Buy
    Summary
11. The Human Side of Machine Learning

    User Experience
   Ensuring User Experience Consistency
   Combatting "Mostly Correct" Predictions
   Smooth Failing
    Team Structure
   Cross-functional Teams Collaboration
   End-to-End Data Scientists
    Responsible AI
   Irresponsible AI: Case Studies
   A Framework for Responsible AI
    Summary
Epilogue
Index

内容大纲

作者介绍

目录

同类热销排行榜

推荐书目