欢迎光临澳大利亚新华书店网 [登录 | 免费注册]

    • 生成式AI实用指南(使用Transformer和扩散模型影印版)(英文版)
      • 作者:(美)Omar Sanseviero//Pedro Cuenca//Apolinario Passos//Jonathan Whitaker|责编:张烨
      • 出版社:东南大学
      • ISBN:9787576620061
      • 出版日期:2025/04/01
      • 页数:396
    • 售价:73.6
  • 内容大纲

        通过这本实用的动手指南,你可以学习如何使用生成式AI技术创造全新的文本、图像、音频,甚至音乐。你将了解最先进的生成模型的工作原理,学习如何根据需求对其进行微调和适配,以及如何组合现有的构建模块来创造新的模型和进行不同领域的创意应用。
        这本入门书从理论概念着手,然后指导读者开展实际应用,并提供了大量代码示例和易懂的插图。你将学习如何使用开源库来利用transformer和扩散模型进行代码探索,并研究若干现有项目来帮助指导你的工作实践。
        构建和自定义能够生成文本和图像的模型。
        探索使用预训练模型与微调自定义模型之间的权衡。
        创建并使用能够以任意风格生成、编辑、修改图像的模型。
        定制transformer和扩散模型以满足多种创意需求。
        训练能够反映个人独特风格的模型。
  • 作者介绍

  • 目录

    Table of Contents
    Preface
    Part I. Leveraging Open Models
      1. An Introduction to Generative Media
        Generating Images
        Generating Text
        Generating Sound Clips
        Ethical and Societal Implications
        Where We've Been and Where Things Stand
        How Are Generative AI Models Created?
        Summary
      2. Transformers
        A Language Model in Action
        Tokenizing Text
        Predicting Probabilities
        Generating Text
        Zero Shot Generalization
        Few Shot Generalization
        A Transformer Block
        Transformer Model Genealogy
        Sequence to Sequence Tasks
        Encoder Only Models
        The Power of Pretraining
        Transformers Recap
        Limitations
        Beyond Text
        Project Time: Using LMs to Generate Text
        Summary
        Exercises
        Challenges
        References
      3. Compressing and Representing Information
        AutoEncoders
          Preparing the Data
          Modeling the Encoder
          Decoder
          Training
          Exploring the Latent Space
          Visualizing the Latent Space
          Variational AutoEncoders
          VAE Encoders and Decoders
          Sampling from the Encoder Distribution
          Training the VAE
          VAEs for Generative Modeling
        CLIP
          Contrastive Loss
          Using CLIP, Step by Step
          Zero Shot Image Classification with CLIP
          Zero Shot Image Classification Pipeline
          CLIP Use Cases

          Alternatives to CLIP
          Project Time: Semantic Image Search
          Summary
          Exercises
          Challenges
          References
      4. Diffusion Models
        The Key Insight: Iterative Refinement
        Training a Diffusion Model
          The Data
          Adding Noise
          The UNet
          Training
          Sampling
          Evaluation
        In Depth: Noise Schedules
          Why Add Noise?
          Starting Simple
          The Math
          Effect of Input Resolution and Scaling
        In Depth: UNets and Alternatives
          A Simple UNet
          Improving the UNet
          Alternative Architectures
        In Depth: Diffusion Objectives
        Project Time: Train Your Diffusion Model
        Summary
        Exercises
        Challenges
        References
      5. Stable Diffusion and Conditional Generation
        Adding Control: Conditional Diffusion Models
        Preparing the Data
        Creating a Class Conditioned Model
        Training the Model
        Sampling
        Improving Efficiency: Latent Diffusion
        Stable Diffusion: Components in Depth
          The Text Encoder
          The Variational AutoEncoder
          The UNet
          Stable Diffusion XL
          FLUX, SD3, and Video
          Classifier Free Guidance
        Putting It All Together: Annotated Sampling Loop
        Open Data, Open Models
        Challenges and the Sunset of LAION 5B
        Alternatives
        Fair and Commercial Use
        Project Time: Build an Interactive ML Demo with Gradio

        Summary
        Exercises
        Challenge
        References
    Part II. Transfer Learning for Generative Models
      6. Fine Tuning Language Models
        Classifying Text
        Identify a Dataset
        Define Which Model Type to Use
        Select a Good Base Model
        Preprocess the Dataset
        Define Evaluation Metrics
        Train the Model
        Still Relevant?
        Generating Text
        Picking the Right Generative Model
        Training a Generative Model
        Instructions
        A Quick Introduction to Adapters
        A Light Introduction to Quantization
        Putting It All Together
        A Deeper Dive into Evaluation
        Project Time: Retrieval Augmented Generation
        Summary
        Exercises
        Challenge
        References
      7. Fine Tuning Stable Diffusion
        Full Stable Diffusion Fine Tuning
          Preparing the Dataset
          Fine Tuning the Model
          Inference
        DreamBooth
          Preparing the Dataset
          Prior Preservation
          DreamBoothing the Model
          Inference
        Training LoRAs
        Giving Stable Diffusion New Capabilities
          Inpainting
          Additional Inputs for Special Conditionings
        Project Time: Train an SDXL DreamBooth LoRA by Yourself
        Summary
        Exercises
        Challenge
        References
    Part III. Going Further
      8. Creative Applications of Text to Image Models
        Image to Image
        Inpainting

        Prompt Weighting and Image Editing
          Prompt Weighting and Merging
          Editing Diffusion Images with Semantic Guidance
        Real Image Editing via Inversion
          Editing with LEDITS++
          Real Image Editing via Instruction Fine Tuning
        ControlNet
        Image Prompting and Image Variations
          Image Variations
          Image Prompting
        Project Time: Your Creative Canvas
        Summary
        Exercises
        References
      9. Generating Audio
        Audio Data
          Waveforms
          Spectrograms
        Speech to Text with Transformer Based Architectures
          Encoder Based Techniques
          Encoder Decoder Techniques
          From Model to Pipeline
          Evaluation
        From Text to Speech to Generative Audio
        Generating Audio with Sequence to Sequence Models
        Going Beyond Speech with Bark
        AudioLM and MusicLM
        AudioGen and MusicGen
        Audio Diffusion and Riffusion
        More on Diffusion Models for Generative Audio
        Evaluating Audio Generation Systems
        What's Next?
        Project Time: End to End Conversational System
        Summary
        Exercises
        Challenges
        References
      10. Rapidly Advancing Areas in Generative AI
        Preference Optimization
        Long Contexts
        Mixture of Experts
        Optimizations and Quantizations
        Data
        One Model to Rule Them All
        Computer Vision
        3D Computer Vision
        Video Generation
        Multimodality
        Community
      A. Open Source Tools

      B. LLM Memory Requirements
      C. End to End Retrieval Augmented Generation
      Index