- 生成式AI实用指南(使用Transformer和扩散模型影印版)(英文版)
- - 作者：(美)Omar Sanseviero//Pedro Cuenca//Apolinario Passos//Jonathan Whitaker|责编:张烨
  - 出版社：东南大学
  - ISBN：9787576620061
  - 出版日期：2025/04/01
  - 页数：396
- 售价：73.6

内容大纲
    通过这本实用的动手指南，你可以学习如何使用生成式AI技术创造全新的文本、图像、音频，甚至音乐。你将了解最先进的生成模型的工作原理，学习如何根据需求对其进行微调和适配，以及如何组合现有的构建模块来创造新的模型和进行不同领域的创意应用。
    这本入门书从理论概念着手，然后指导读者开展实际应用，并提供了大量代码示例和易懂的插图。你将学习如何使用开源库来利用transformer和扩散模型进行代码探索，并研究若干现有项目来帮助指导你的工作实践。
    构建和自定义能够生成文本和图像的模型。
    探索使用预训练模型与微调自定义模型之间的权衡。
    创建并使用能够以任意风格生成、编辑、修改图像的模型。
    定制transformer和扩散模型以满足多种创意需求。
    训练能够反映个人独特风格的模型。
作者介绍
目录
Table of Contents
Preface
Part I. Leveraging Open Models
  1. An Introduction to Generative Media
    Generating Images
    Generating Text
    Generating Sound Clips
    Ethical and Societal Implications
    Where We've Been and Where Things Stand
    How Are Generative AI Models Created?
    Summary
  2. Transformers
    A Language Model in Action
    Tokenizing Text
    Predicting Probabilities
    Generating Text
    Zero Shot Generalization
    Few Shot Generalization
    A Transformer Block
    Transformer Model Genealogy
    Sequence to Sequence Tasks
    Encoder Only Models
    The Power of Pretraining
    Transformers Recap
    Limitations
    Beyond Text
    Project Time: Using LMs to Generate Text
    Summary
    Exercises
    Challenges
    References
  3. Compressing and Representing Information
    AutoEncoders
      Preparing the Data
      Modeling the Encoder
      Decoder
      Training
      Exploring the Latent Space
      Visualizing the Latent Space
      Variational AutoEncoders
      VAE Encoders and Decoders
      Sampling from the Encoder Distribution
      Training the VAE
      VAEs for Generative Modeling
    CLIP
      Contrastive Loss
      Using CLIP, Step by Step
      Zero Shot Image Classification with CLIP
      Zero Shot Image Classification Pipeline
      CLIP Use Cases

      Alternatives to CLIP
      Project Time: Semantic Image Search
      Summary
      Exercises
      Challenges
      References
  4. Diffusion Models
    The Key Insight: Iterative Refinement
    Training a Diffusion Model
      The Data
      Adding Noise
      The UNet
      Training
      Sampling
      Evaluation
    In Depth: Noise Schedules
      Why Add Noise?
      Starting Simple
      The Math
      Effect of Input Resolution and Scaling
    In Depth: UNets and Alternatives
      A Simple UNet
      Improving the UNet
      Alternative Architectures
    In Depth: Diffusion Objectives
    Project Time: Train Your Diffusion Model
    Summary
    Exercises
    Challenges
    References
  5. Stable Diffusion and Conditional Generation
    Adding Control: Conditional Diffusion Models
    Preparing the Data
    Creating a Class Conditioned Model
    Training the Model
    Sampling
    Improving Efficiency: Latent Diffusion
    Stable Diffusion: Components in Depth
      The Text Encoder
      The Variational AutoEncoder
      The UNet
      Stable Diffusion XL
      FLUX, SD3, and Video
      Classifier Free Guidance
    Putting It All Together: Annotated Sampling Loop
    Open Data, Open Models
    Challenges and the Sunset of LAION 5B
    Alternatives
    Fair and Commercial Use
    Project Time: Build an Interactive ML Demo with Gradio

    Summary
    Exercises
    Challenge
    References
Part II. Transfer Learning for Generative Models
  6. Fine Tuning Language Models
    Classifying Text
    Identify a Dataset
    Define Which Model Type to Use
    Select a Good Base Model
    Preprocess the Dataset
    Define Evaluation Metrics
    Train the Model
    Still Relevant?
    Generating Text
    Picking the Right Generative Model
    Training a Generative Model
    Instructions
    A Quick Introduction to Adapters
    A Light Introduction to Quantization
    Putting It All Together
    A Deeper Dive into Evaluation
    Project Time: Retrieval Augmented Generation
    Summary
    Exercises
    Challenge
    References
  7. Fine Tuning Stable Diffusion
    Full Stable Diffusion Fine Tuning
      Preparing the Dataset
      Fine Tuning the Model
      Inference
    DreamBooth
      Preparing the Dataset
      Prior Preservation
      DreamBoothing the Model
      Inference
    Training LoRAs
    Giving Stable Diffusion New Capabilities
      Inpainting
      Additional Inputs for Special Conditionings
    Project Time: Train an SDXL DreamBooth LoRA by Yourself
    Summary
    Exercises
    Challenge
    References
Part III. Going Further
  8. Creative Applications of Text to Image Models
    Image to Image
    Inpainting

    Prompt Weighting and Image Editing
      Prompt Weighting and Merging
      Editing Diffusion Images with Semantic Guidance
    Real Image Editing via Inversion
      Editing with LEDITS++
      Real Image Editing via Instruction Fine Tuning
    ControlNet
    Image Prompting and Image Variations
      Image Variations
      Image Prompting
    Project Time: Your Creative Canvas
    Summary
    Exercises
    References
  9. Generating Audio
    Audio Data
      Waveforms
      Spectrograms
    Speech to Text with Transformer Based Architectures
      Encoder Based Techniques
      Encoder Decoder Techniques
      From Model to Pipeline
      Evaluation
    From Text to Speech to Generative Audio
    Generating Audio with Sequence to Sequence Models
    Going Beyond Speech with Bark
    AudioLM and MusicLM
    AudioGen and MusicGen
    Audio Diffusion and Riffusion
    More on Diffusion Models for Generative Audio
    Evaluating Audio Generation Systems
    What's Next?
    Project Time: End to End Conversational System
    Summary
    Exercises
    Challenges
    References
  10. Rapidly Advancing Areas in Generative AI
    Preference Optimization
    Long Contexts
    Mixture of Experts
    Optimizations and Quantizations
    Data
    One Model to Rule Them All
    Computer Vision
    3D Computer Vision
    Video Generation
    Multimodality
    Community
  A. Open Source Tools

  B. LLM Memory Requirements
  C. End to End Retrieval Augmented Generation
  Index

内容大纲

作者介绍

目录

同类热销排行榜

推荐书目