-
内容大纲
数据科学离不开代码。编写可复现、稳健、可伸缩代码的能力是数据科学项目成功的关键,对于那些和生产代码打交道的人来说,这一点至关重要。这本实用书籍填补了数据科学与软件工程之间的空白,清晰地解释了如何将软件工程的最佳实践应用于数据科学。
本书提供的示例基于Python,取材于NumPy和pandas等流行的包。如果你想编写更好的数据科学代码,本指南涵盖了数据科学入门或编码课程中经常缺失的重要主题,包括如何:
理解数据结构和面向对象编程
清晰且熟练地记录代码
打包并共享你的代码
将数据科学代码集成到更大的代码库中
学习编写API
创建安全的代码
将最佳实践应用于测试、错误处理、日志记录等常见任务
更高效地与软件工程师合作
编写更高效、可维护、稳健的Python代码
将你的数据科学项目投入生产
等等 -
作者介绍
凯瑟琳·纳尔逊(Catherine Nelson),是一名自由数据科学家和作家。此前,她曾担任SAP Concur的首席数据科学家,开发生产级机器学习应用,并创建了崭新的商务旅行功能。她还是O’Reilly出版的Building Machine Learning Pipelines一书的合著者。 -
目录
Preface
1. What Is Good Code?
Why Good Code Matters
Adapting to Changing Requirements
Simplicity
Don't Repeat Yourself (DRY)
Avoid Verbose Code
Modularity
Readability
Standards and Conventions
Names
Cleaning up
Documentation
Performance
Robustness
Errors and Logging
Testing
Key Takeaways
2. Analyzing Code Performance
Methods to Improve Performance
Timing Your Code
Profiling Your Code
cProfile
line_profiler
Memory Profiling with Memray
Time Complexity
How to Estimate Time Complexity
Big O Notation
Key Takeaways
3. Using Data Struaures Effeaively
Native Python Data Structures
Lists
Tuples
Dictionaries
Sets
NumPy Arrays
NumPy Array Functionality
NumPy Array Performance Considerations
Array Operations Using Dask
Arrays in Machine Learning
pandas DataFrames
DataFrame Functionality
DataFrame Performance Considerations
Key Takeaways
4. Object-Oriented Programming and Functional Programming
Ob)ect-Oriented Programming
Classes, Methods, and Attributes
Defining Your Own Classes
OOP Principles
Functional Programming
Lambda Functions and map()
Applying Functions to DataFrames
Which Paradigm Should I Use?
Key Takeaways
5. trr0rs, togging, and Debugging
Errors in Python
Reading Python Error Messages
Handling Errors
Raising Errors
Logging
What to Log
Logging Configuration
How to Log
Debugging
Strategies for Debugging
Tools for Debugging
Key Takeaways
6. Code Formatting, Linting, and Type Checking
Code Formatting and Style Guides
PEP8
Import Formatting
Automatic Code Formatting with Black
Linting
Linting Tools
Linting in Your IDE
Type Checking
Type Annotations
Type Checking with mypy
Key Takeaways
7. Testing Your Code
Why You Should Write Tests
When to Test
How to Write and Run Tests
A Basic Test
Testing Unexpected Inputs
Running Automated Tests with Pytest
Types of Tests
Unit Tests
Integration Tests
Data Validation
Data Validation Examples
Using Pandera for Data Validation
Data Validation with Pydantic
Testing for Machine Learning
Testing Model Training
Testing Model Inference
Key Takeaways
8. Design and Refactoring
Project Design and Structure
Project Design Considerations
An Example Machine Learning Project
Code Design
Modular Code
A Code Design Framework
Interfaces and Contracts
Coupling
From Notebooks to Scalable Scripts
Why Use Scripts Instead of Notebooks?
Creating Scripts from Notebooks
Refactoring
Strategies for Refactoring
An Example Refactoring Workflow
Key Takeaways
9. Documentation
Documentation Within the Codebase
Names
Comments
Docstrings
Readmes, Tutorials, and Other Longer Documents
Documentation in Jupyter Notebooks
Documenting Machine Learning Experiments
Key Takeaways
10. Sharing Your Code: Version Control, Dependencies, and Packaging
Version Control Using Git
How Does Git Work?
Tracking Changes and Committing
Remote and Local
Branches and Pull Requests
Dependencies and Virtual Environments
Virtual Environments
Managing Dependencies with pip
Managing Dependencies with Poetry
Python Packaging
Packaging Basics
pyproject.toml
Building and Uploading Packages
Key Takeaways
11. APIs
Calling an API
HTTP Methods and Status Codes
Getting Data from the SDG API
Creating Your Own API Using FastAPI
Setting Up the API
Adding Functionality to Your API
Making Requests to Your API
Key Takeaways
12. Automation and Deployment
Deploying Code
Automation Examples
Pre-Commit Hooks
GitHub Actions
Cloud Deployments
Containers and Docker
Building a Docker Container
Deploying an API on Google Cloud
Deploying an API on Other Cloud Providers
Key Takeaways
13. Security
What Is Security?
Security Risks
Credentials, Physical Security, and Social Engineering
Third-Party Packages
The Python Pickle Module
Version Control Risks
API Security Risks
Security Practices
Security Reviews and Policies
Secure Coding Tools
Simple Code Scanning
Security for Machine Learning
Attacks on ML Systems
Security Practices for ML Systems
Key Takeaways
14. Working in Software
Development Principles and Practices
The Software Development Lifecycle
Waterfall Software Development
Agile Software Development
Agile Data Science
Roles in the Software Industry
Software Engineer
QA or Test Engineer
Data Engineer
Data Analyst
Product Manager
UX Researcher
Designer
Community
Open Source
Speaking at Events
The Python Community
Key Takeaways
15. Next Steps
The Future of Code
Your Future in Code
Thank You
Index
同类热销排行榜
- C语言与程序设计教程(高等学校计算机类十二五规划教材)16
- 电机与拖动基础(教育部高等学校自动化专业教学指导分委员会规划工程应用型自动化专业系列教材)13.48
- 传感器与检测技术(第2版高职高专电子信息类系列教材)13.6
- ASP.NET项目开发实战(高职高专计算机项目任务驱动模式教材)15.2
- Access数据库实用教程(第2版十二五职业教育国家规划教材)14.72
- 信号与系统(第3版下普通高等教育九五国家级重点教材)15.08
- 电气控制与PLC(普通高等教育十二五电气信息类规划教材)17.2
- 数字电子技术基础(第2版)17.36
- VB程序设计及应用(第3版十二五职业教育国家规划教材)14.32
- Java Web从入门到精通(附光盘)/软件开发视频大讲堂27.92
推荐书目
-

孩子你慢慢来/人生三书 华人世界率性犀利的一枝笔,龙应台独家授权《孩子你慢慢来》20周年经典新版。她的《...
-

时间简史(插图版) 相对论、黑洞、弯曲空间……这些词给我们的感觉是艰深、晦涩、难以理解而且与我们的...
-

本质(精) 改革开放40年,恰如一部四部曲的年代大戏。技术突变、产品迭代、产业升级、资本对接...
[
