欢迎光临澳大利亚新华书店网 [登录 | 免费注册]

    • 设计机器学习系统(影印版)(英文版)
      • 作者:(越)奇普·胡岩|责编:张烨
      • 出版社:东南大学
      • ISBN:9787576602241
      • 出版日期:2022/10/01
      • 页数:367
    • 售价:55.2
  • 内容大纲

        机器学习系统既复杂又独特。复杂是因为包含大量组件,涉及许多不同的利益方;独特是因为其依赖于数据,不同用例之间的数据差异很大。在本书中,你将学习以一种整体方法来设计兼具可靠性、可伸缩性、可维护性,并能适应不断变化的环境和业务需求的机器学习系统。
        作者Chip Huyen是CIaypot AI的联合创始人,她在如何帮助系统作为一个整体实现其目标的背景下考虑了每一种设计决策,例如如何处理和创建训练数据,使用哪些特性,重新训练模型的频率,以及监测哪些内容。书中的迭代框架采用了真实的案例研究,并辅以大量参考资料。
        这本书将帮助你处理以下情况:
        工程化数据并选择正确的指标来解决业务问题
        实现持续开发、评估、部署和更新模型的流程自动化
        开发监控系统,快速检测和解决模型在生产中可能遇到的问题
        构建跨用例服务的机器学习平台
        开发可靠的机器学习系统
  • 作者介绍

        奇普·胡岩是实时机器学习平台Claypot Al的联合创始人。凭借在NVIDIA、Netflix和Snorkel AI的工作,她帮助了一些世界上最大的组织开发和部署机器学习系统。
  • 目录

    Preface
    1. Overview of Machine Learning Systems
       When to Use Machine Learning
      Machine Learning Use Cases
       Understanding Machine Learning Systems
      Machine Learning in Research Versus in Production
      Machine Learning Systems Versus Traditional Software
       Summary
    2. Introduction to Machine Learning Systems Design
       Business and ML Objectives
       Requirements for ML Systems
      Reliability
      Scalability
      Maintainability
      Adaptability
       Iterative Process
       Framing ML Problems
      Types of ML Tasks
      Objective Functions
       Mind Versus Data
       Summary
    3. Data Engineering Fundamentals
       Data Sources
       Data Formats
      ISON
      Row-Major Versus Column-Major Format
      Text Versus Binary Format
       Data Models
      Relational Model
      NoSQL
      Structured Versus Unstructured Data
       Data Storage Engines and Processing
      Transactional and Analytical Processing
      ETL: Extract, Transform, and Load
       Modes of Dataflow
      Data Passing Through Databases
      Data Passing Through Services
      Data Passing Through Real-Time Transport
       Batch Processing Versus Stream Processing
       Summary
    4. Training Data
       Sampling
      Nonprobability Sampling
      Simple Random Sampling
      Stratified Sampling
      Weighted Sampling
      Reservoir Sampling
      Importance Sampling
       Labeling
      Hand Labels

      Natural Labels
      Handling the Lack of Labels
       Class Imbalance
      Challenges of Class Imbalance
      Handling Class Imbalance
       Data Augmentation
      Simple Label-Preserving Transformations
      Perturbation
      Data Synthesis
       Summary
    5. Feature Engineering
       Learned Features Versus Engineered Features
       Common Feature Engineering Operations
      Handling Missing Values
      Scaling
       Discretization
       Encoding Categorical Features
       Feature Crossing
       Discrete and Continuous Positional Embeddings
       Data Leakage
       Common Causes for Data Leakage
       Detecting Data Leakage
       Engineering Good Features
       Feature Importance
       Feature Generalization
       Summary
    6. Model Development and 0ffline Evaluation
       Model Development and Training
       Evaluating ML Models
       Ensembles
       Experiment Tracking and Versioning
       Distributed Training
       AutoML
       Model Offline Evaluation
       Baselines
       Evaluation Methods
       Summary
    7. Model Deployment and Prediction Service
       Machine Learning Deployment Myths
       Myth 1: You Only Deploy One or Two ML Models at a Time
       Myth 2: If We Don't Do Anything, Model Performance Remains the Same
       Myth 3: You Won't Need to Update Your Models as Much
       Myth 4: Most ML Engineers Don't Need to Worry About Scale
       Batch Prediction Versus Online Prediction
       From Batch Prediction to Online Prediction
       Unifying Batch Pipeline and Streaming Pipeline
       Model Compression
       Low-Rank Factorization
       Knowledge Distillation
       Pruning

       Quantization
       ML on the Cloud and on the Edge
       Compiling and Optimizing Models for Edge Devices
       ML in Browsers
        Summary
    8. Data Distribution Shifts and Monitoring
      Causes of ML System Failures
        Software System Failures
        ML-Specific Failures
      Data Distribution Shifts
        Types of Data Distribution Shifts
        General Data Distribution Shifts
        Detecting Data Distribution Shifts
        Addressing Data Distribution Shifts
      Monitoring and Observability
        ML-Specific Metrics
        Monitoring Toolbox
        Observability
      Summary
    9. Continual Learning and Test in Production
      Continual Learning
        Stateless Retraining Versus Stateful Training
        Why Continual Learning?
        Continual Learning Challenges
        Four Stages of Continual Learning
        How Often to Update Your Models
        Test in Production
        Shadow Deployment
        A/B Testing
        Canary Release
        Interleaving Experiments
        Bandits
      Summary
    10. Infrastructure and Tooling for MLOps
      Storage and Compute
        Public Cloud Versus Private Data Centers
      Development Environment
        Dev Environment Setup
        Standardizing Dev Environments
        From Dev to Prod: Containers
      Resource Management
        Cron, Schedulers, and Orchestrators
        Data Science Workflow Management
        ML Platform
        Model Deployment
       Model Store
       Feature Store
        Build Versus Buy
        Summary
    11. The Human Side of Machine Learning

        User Experience
       Ensuring User Experience Consistency
       Combatting "Mostly Correct" Predictions
       Smooth Failing
        Team Structure
       Cross-functional Teams Collaboration
       End-to-End Data Scientists
        Responsible AI
       Irresponsible AI: Case Studies
       A Framework for Responsible AI
        Summary
    Epilogue
    Index