欢迎光临澳大利亚新华书店网 [登录 | 免费注册]

    • 面向数据科学家的软件工程(影印版)(英文版)
      • 作者:(英)凯瑟琳·纳尔逊|责编:张烨
      • 出版社:东南大学
      • ISBN:9787576617672
      • 出版日期:2025/02/01
      • 页数:238
    • 售价:39.6
  • 内容大纲

        数据科学离不开代码。编写可复现、稳健、可伸缩代码的能力是数据科学项目成功的关键,对于那些和生产代码打交道的人来说,这一点至关重要。这本实用书籍填补了数据科学与软件工程之间的空白,清晰地解释了如何将软件工程的最佳实践应用于数据科学。
        本书提供的示例基于Python,取材于NumPy和pandas等流行的包。如果你想编写更好的数据科学代码,本指南涵盖了数据科学入门或编码课程中经常缺失的重要主题,包括如何:
        理解数据结构和面向对象编程
        清晰且熟练地记录代码
        打包并共享你的代码
        将数据科学代码集成到更大的代码库中
        学习编写API
        创建安全的代码
        将最佳实践应用于测试、错误处理、日志记录等常见任务
        更高效地与软件工程师合作
        编写更高效、可维护、稳健的Python代码
        将你的数据科学项目投入生产
        等等
  • 作者介绍

        凯瑟琳·纳尔逊(Catherine Nelson),是一名自由数据科学家和作家。此前,她曾担任SAP Concur的首席数据科学家,开发生产级机器学习应用,并创建了崭新的商务旅行功能。她还是O’Reilly出版的Building Machine Learning Pipelines一书的合著者。
  • 目录

    Preface
    1. What Is Good Code?
      Why Good Code Matters
      Adapting to Changing Requirements
      Simplicity
        Don't Repeat Yourself (DRY)
        Avoid Verbose Code
      Modularity
      Readability
        Standards and Conventions
        Names
        Cleaning up
        Documentation
      Performance
      Robustness
        Errors and Logging
        Testing
      Key Takeaways
    2. Analyzing Code Performance
      Methods to Improve Performance
      Timing Your Code
      Profiling Your Code
        cProfile
        line_profiler
        Memory Profiling with Memray
      Time Complexity
        How to Estimate Time Complexity
        Big O Notation
      Key Takeaways
    3. Using Data Struaures Effeaively
      Native Python Data Structures
        Lists
        Tuples
        Dictionaries
        Sets
      NumPy Arrays
        NumPy Array Functionality
        NumPy Array Performance Considerations
        Array Operations Using Dask
        Arrays in Machine Learning
      pandas DataFrames
        DataFrame Functionality
        DataFrame Performance Considerations
      Key Takeaways
    4. Object-Oriented Programming and Functional Programming
      Ob)ect-Oriented Programming
        Classes, Methods, and Attributes
        Defining Your Own Classes
        OOP Principles
       Functional Programming

        Lambda Functions and map()
        Applying Functions to DataFrames
      Which Paradigm Should I Use?
       Key Takeaways
    5. trr0rs, togging, and Debugging
       Errors in Python
        Reading Python Error Messages
        Handling Errors
        Raising Errors
       Logging
        What to Log
        Logging Configuration
        How to Log
       Debugging
        Strategies for Debugging
        Tools for Debugging
      Key Takeaways
    6. Code Formatting, Linting, and Type Checking
      Code Formatting and Style Guides
        PEP8
        Import Formatting
        Automatic Code Formatting with Black
      Linting
        Linting Tools
        Linting in Your IDE
      Type Checking
        Type Annotations
        Type Checking with mypy
      Key Takeaways
    7. Testing Your Code
      Why You Should Write Tests
      When to Test
      How to Write and Run Tests
        A Basic Test
        Testing Unexpected Inputs
        Running Automated Tests with Pytest
      Types of Tests
        Unit Tests
        Integration Tests
      Data Validation
        Data Validation Examples
        Using Pandera for Data Validation
        Data Validation with Pydantic
       Testing for Machine Learning
        Testing Model Training
        Testing Model Inference
       Key Takeaways
    8. Design and Refactoring
      Project Design and Structure
        Project Design Considerations

        An Example Machine Learning Project
      Code Design
        Modular Code
        A Code Design Framework
        Interfaces and Contracts
        Coupling
      From Notebooks to Scalable Scripts
        Why Use Scripts Instead of Notebooks?
        Creating Scripts from Notebooks
      Refactoring
        Strategies for Refactoring
        An Example Refactoring Workflow
       Key Takeaways
    9. Documentation
      Documentation Within the Codebase
        Names
        Comments
        Docstrings
        Readmes, Tutorials, and Other Longer Documents
      Documentation in Jupyter Notebooks
      Documenting Machine Learning Experiments
      Key Takeaways
    10. Sharing Your Code: Version Control, Dependencies, and Packaging
      Version Control Using Git
        How Does Git Work?
        Tracking Changes and Committing
        Remote and Local
        Branches and Pull Requests
      Dependencies and Virtual Environments
        Virtual Environments
        Managing Dependencies with pip
        Managing Dependencies with Poetry
      Python Packaging
        Packaging Basics
        pyproject.toml
        Building and Uploading Packages
      Key Takeaways
    11. APIs
      Calling an API
        HTTP Methods and Status Codes
        Getting Data from the SDG API
      Creating Your Own API Using FastAPI
        Setting Up the API
        Adding Functionality to Your API
        Making Requests to Your API
      Key Takeaways
    12. Automation and Deployment
      Deploying Code
      Automation Examples
        Pre-Commit Hooks

        GitHub Actions
      Cloud Deployments
        Containers and Docker
        Building a Docker Container
        Deploying an API on Google Cloud
        Deploying an API on Other Cloud Providers
      Key Takeaways
    13. Security
      What Is Security?
      Security Risks
        Credentials, Physical Security, and Social Engineering
        Third-Party Packages
        The Python Pickle Module
        Version Control Risks
        API Security Risks
      Security Practices
        Security Reviews and Policies
        Secure Coding Tools
        Simple Code Scanning
      Security for Machine Learning
        Attacks on ML Systems
        Security Practices for ML Systems
        Key Takeaways
    14. Working in Software
      Development Principles and Practices
        The Software Development Lifecycle
        Waterfall Software Development
        Agile Software Development
        Agile Data Science
      Roles in the Software Industry
        Software Engineer
        QA or Test Engineer
        Data Engineer
        Data Analyst
        Product Manager
        UX Researcher
        Designer
      Community
        Open Source
        Speaking at Events
        The Python Community
      Key Takeaways
    15. Next Steps
      The Future of Code
      Your Future in Code
      Thank You
    Index