Inhaltsverzeichnis

Alle Kapitel aufklappen

Alle Kapitel zuklappen

1 Introduction

1.1 Overview of Deep Learning

1.1.1 The Success of Deep Learning

1.1.2 The Two Pillars Supporting These Breakthroughs

1.1.3 Why Now?

1.2 Why Keras

1.3 The Structure of This Book

1.3.1 Text Boxes

1.4 How to Use This Book

1.4.1 Why Keras Installation Comes Later

1.4.2 Suggested Reading Strategy

2 Introduction to the Core of Machine Learning

2.1 What Is Machine Learning?

2.1.1 Helping the Machine Recognize Digits

2.1.2 The Outdated Method—Rule-Based Learning

2.1.3 Case Study in Machine Learning Code

2.1.4 Structuring the Learning Process

2.1.5 Analyzing the Machine’s Learning Process

2.1.6 Lessons Learned from the Code Case Study

2.2 Types of Machine Learning

2.2.1 Supervised Learning

2.2.2 Unsupervised Learning

2.3 The Magic Sauce: Reinforcement Learning

2.3.1 The Building Blocks of Reinforcement Learning

2.3.2 Key Applications for Reinforcement Learning

2.3.3 Challenges in Implementing Reinforcement Learning

2.4 Basics of Neural Networks

2.4.1 Core Components of Neural Networks

2.4.2 The Unintuitive Process of Learning

2.4.3 Clearing Up Some Misconceptions

2.5 Setting Up Your Environment

2.6 Summary

3 Fundamentals of Gradient Descent

3.1 Understanding Gradient Descent

3.1.1 The Basic Setup

3.1.2 Formalizing the Process: Training Phase

3.1.3 From Training to Deployment: Making It Work in the Real World

3.1.4 The Process of Learning

3.1.5 Finding the Best Parameters: Minimizing the Loss

3.1.6 Lifting the Assumptions of a Single Input Feature

3.1.7 Higher Dimensions in Gradient Descent

3.2 Types of Gradient Descent: Batch, Stochastic, Mini-Batch

101

3.2.1 Contour Plots for Visualization of Gradient Descent

102

3.2.2 Improving the Efficiency of Batch Gradient Descent

103

3.2.3 A Paradox in the Use of Stochastic Gradient Descent

105

3.3 Learning Rate and Optimization

107

3.4 Implementing Gradient Descent in Code

110

3.4.1 Gradient Descent from Scratch

110

3.4.2 Gradient Descent Using Keras

113

3.5 Summary

116

4 Classification Through Gradient Descent

117

4.1 Classification Basics

118

4.1.1 Classification Problem Setup

119

4.1.2 First Attempt Using Gradient Descent

121

4.1.3 Second Attempt: Fixing the Issues in the First Attempt

122

4.1.4 Third Attempt: Fixing the Loss Function

128

4.1.5 Squishing Functions and Decision Boundaries

131

4.1.6 Learning Process Summary

134

4.2 Nonlinear Relationships and Neural Networks

136

4.2.1 Feature Transformations

137

4.2.2 The Kernel Trick, Logistic Regression, and All of Machine Learning

140

4.3 Binary vs. Multi-Class Classification

147

4.3.1 The One-vs-All Approach

147

4.3.2 The Softmax Classifier

152

4.4 Loss Functions: Cross-Entropy

155

4.4.1 Categorical Cross-Entropy

156

4.4.2 Pitfall to Avoid When Using Cross Entropies

157

4.4.3 Sparse Categorical Cross-Entropy

158

4.5 Building a Classifier with Gradient Descent

161

4.5.1 Layers in Code

161

4.5.2 Choice of Loss Function

163

4.5.3 Parameter Counts

164

4.5.4 The Density of Keras Code

165

4.6 Summary

166

5 Deep Dive into Keras

167

5.1 Introduction to Keras Framework

168

5.1.1 The Philosophy Behind Keras: Making AI Human-Friendly

169

5.1.2 Evolution Through Adaptability

169

5.1.3 Keras 3.0: The Multi-Engine Framework

171

5.1.4 Key Strengths of Keras

172

5.1.5 Keras in the Real World

173

5.2 Setting Up Keras

174

5.2.1 Setting Up Python

175

5.2.2 TensorFlow Installation and Points to Keep in Mind

177

5.2.3 Setting Up CUDA for GPU Acceleration

180

5.2.4 Installing Keras

185

5.2.5 Using a GPU in Google Collaboratory

186

5.3 Building Your First Model

188

5.3.1 Why NumPy Matters for Machine Learning

188

5.3.2 Symbolic Computation: The Magic Behind Neural Networks

200

5.4 Implementing Core Concepts in Keras: Gradient Descent and Classification

205

5.4.1 The Building Blocks of a Cat-Dog Classifier

205

5.4.2 Getting and Fixing the Data

206

5.4.3 Performance Optimization and Model Specification

210

5.4.4 Evaluation Metrics

212

5.4.5 Guiding the Training Through Callbacks and Checkpoints

216

5.4.6 Evaluating the Model

218

5.5 Summary

222

6 Regularization Techniques

223

6.1 An Overview of Overfitting and Underfitting: Do You Need More Data?

224

6.1.1 From Lines to Curves: Adding Polynomial Features

225

6.1.2 Using Increasingly Complex Models

228

6.1.3 The Balance of Complexity

229

6.1.4 Regularization Term

234

6.1.5 Adjusting the Complexity Knob

237

6.1.6 Do I Need More Data

240

6.1.7 Reporting the Final Results: Validation Set

241

6.2 Dropout: Concept and Implementation

243

6.2.1 The Problem: Co-Adaptation and Overfitting

243

6.2.2 The Road to Memorization

244

6.2.3 The Ensemble Intuition Behind Dropout

245

6.2.4 Dropout Mechanics

246

6.2.5 Finding the Sweet Spot: Dropout Rates in Practice

247

6.2.6 Implementing Dropout in Pure Python

248

6.2.7 Common Pitfalls and Debugging Tips

250

6.3 Other Regularization Methods: L1 and L2 Regularization

251

6.3.1 L1 Regularization (Lasso)

252

6.3.2 Elastic Net: Combining L1 and L2

252

6.3.3 When Not to Use Regularization

253

6.3.4 Practical Considerations: Dropout vs. L1/L2 in Neural Networks

254

6.4 Applying Regularization in Keras

254

6.4.1 Implementing L2 Regularization in Keras

255

6.4.2 Dropout in Keras

255

6.4.3 Beyond Basic Dropout: Specialized Variants

257

6.4.4 Finding the Perfect Dropout Rate: A Systematic Approach

259

6.5 Summary

264

7 Convolutional Neural Networks

265

7.1 Introduction to Convolutional Neural Networks

266

7.1.1 The Limitation of Fully Connected Networks

266

7.1.2 The Learning Standstill

269

7.1.3 Solving the Vanishing Gradient Problem

270

7.1.4 Dense vs. Sparse Connections

272

7.1.5 From Convolution to Neural Networks: The Conv2D Layer

279

7.2 Convolutional Layers, Pooling Layers and Fully Connected Layers

287

7.2.1 Core Implementation of a Convolutional Layer

288

7.2.2 The Hidden Superpowers of Convolutional Layers

291

7.2.3 Pooling Layers: The Image Simplifiers

293

7.2.4 Global Pooling

295

7.2.5 Bringing It All Together: Fully Connected Layers in CNNs

299

7.3 Implementing CNNs with Keras

301

7.3.1 Conv2D vs. Conv1D

301

7.3.2 The Opposite of Convolution: Deconvolution Layers

301

7.4 The “Shapes” Problem

303

7.5 Case Study: Image Classification

307

7.6 Summary

313

8 Exploring the Keras Functional API

315

8.1 Overview of Keras Functional API

316

8.1.1 The Information Bottleneck

317

8.1.2 Networks as Directed Acyclic Graphs

318

8.1.3 The Functional Programming Heritage

319

8.1.4 Key Advantages of the Functional API

320

8.2 Building Complex Models with the Functional API

323

8.2.1 Overview of the Functional API Syntax

323

8.2.2 Creating Models with the Functional API

324

8.2.3 Best Practices for Complex Models

327

8.2.4 Handling Multiple Inputs and Outputs

328

8.2.5 Example: Building an Image Captioning Model

331

8.2.6 Residual Connections

332

8.2.7 Branching Architectures

336

8.3 Use Cases and Examples

340

8.3.1 Image Classification with ResNet

341

8.3.2 Siamese Networks for Similarity Learning

346

8.3.3 U-Net for Image Segmentation

354

8.4 Using Transfer Learning to Customize Models for Your Organization

364

8.4.1 The Why of Transfer Learning

365

8.4.2 Leveraging Pretrained Models from Keras

366

8.4.3 The Process of Transfer Learning

368

8.4.4 Reloading an Existing Model

368

8.5 Summary

373

9 Understanding Transformers

375

9.1 The Theory Behind Transformers

376

9.1.1 A Simple Time Series Example

377

9.1.2 From Numbers to Words: The Challenge of Text Data

382

9.1.3 GloVe: Learning the Language of Vectors

383

9.1.4 A Gentle Introduction to Attention

388

9.1.5 Why Transformers Revolutionized Natural Language Processing and Beyond

391

9.2 Components: Attention Mechanism, Encoder, Decoder

393

9.2.1 The Conversation Between Words

394

9.2.2 Why Position Information Matters

398

9.2.3 Encoder Structure: The Information Processing Powerhouse

401

9.2.4 Decoder Structure: Creating New Sequences from Understanding

403

9.3 Implementing Transformers in Keras

406

9.3.1 The Encoder Block

407

9.3.2 The Decoder Block

413

9.3.3 The Transformer: Putting the Encoder and Decoder Together

415

9.4 Case Study: Large Language Model Chatbot

418

9.4.1 Structure of Modern Keras Transformer Models

418

9.4.2 Working with Pretrained Models from Kaggle Hub

422

9.5 Summary

427

10 Reinforcement Learning: The Secret Sauce

429

10.1 Introduction to Reinforcement Learning

430

10.1.1 The Problem of Learning by Doing

430

10.1.2 Brief History and Major Breakthroughs

433

10.1.3 Real-World Applications

434

10.1.4 Challenges Unique to Reinforcement Learning

436

10.2 Key Concepts: Agents, Environments, Rewards

438

10.2.1 Structure of the Reinforcement Learning Framework

438

10.2.2 Environment Design and State Representation

440

10.2.3 Understanding Agents and Policy Functions

442

10.2.4 Reward Engineering and Signal Design

443

10.2.5 The Exploration vs. Exploitation Dilemma

445

10.3 Popular Algorithms: Q-Learning, Policy Gradients, and Deep Q-Networks

447

10.3.1 The Markov Decision Processes

447

10.3.2 Value Functions and Q-Tables

449

10.3.3 Building the Q-Table

451

10.3.4 Q-Learning Algorithm: A Worked Example

453

10.3.5 Q-Learning and Associated Issues

458

10.3.6 The Limits of Tabular Q-Learning

460

10.4 Implementing Reinforcement Learning Models in Keras

464

10.4.1 Our First Reinforcement Learning Environment

465

10.4.2 Implementing the Deep Q-Network Algorithm with Keras

473

10.4.3 Experience Replay and Target Networks: The Foundations of Stable Deep Reinforcement Learning

482

10.5 Reinforcement Learning in Large Language Models

486

10.5.1 The Fundamental Challenge: Moving Beyond Prediction

487

10.5.2 Challenges and Limitations

489

10.5.3 Future Directions and Emerging Approaches

491

10.6 Summary

493

11 Autoencoders and Generative AI

495

11.1 Introduction to Autoencoders

496

11.1.1 What Are Autoencoders?

497

11.1.2 Autoencoder Architecture Deep Dive

500

11.1.3 Building Your First Autoencoder in Keras

505

11.1.4 Types of Autoencoders

514

11.2 Variational Autoencoders

519

11.2.1 Navigating the Space with Uncertainty

520

11.2.2 Mathematical Framework of Variational Autoencoders

521

11.2.3 Variational Autoencoder Implementation in Keras

525

11.3 Generative Adversarial Networks

535

11.3.1 The Adversarial Game

535

11.3.2 Generative Adversarial Network Architecture

537

11.3.3 Other Variations

541

11.3.4 Generative Adversarial Network Implementation in Keras

543

11.3.5 Implementation Challenges

551

11.4 Summary

552

12 Advanced Generative AI: Stable Diffusion

553

12.1 Theory Behind Stable Diffusion

554

12.1.1 From Previous Generative Models to Diffusion

555

12.1.2 Diffusion Process Fundamentals

557

12.1.3 Reverse Diffusion: Learning to Denoise

558

12.1.4 Connections to Physical Processes

559

12.1.5 Denoising Diffusion Probabilistic Models

559

12.1.6 Latent Diffusion and Stable Diffusion Architecture

562

12.1.7 Cross-Attention: The Bridge Between Text and Images

564

12.2 How Stable Diffusion Uses Core Concepts

565

12.2.1 Efficient Diffusion Through Learned Representations

566

12.2.2 Advanced Attention Mechanisms

567

12.2.3 Training Strategies and Optimization

570

12.3 Implementing Stable Diffusion Models

572

12.3.1 Environment and Data Prep

573

12.3.2 Setting Up an Evaluation Measure

574

12.3.3 Model Description and Time-Step Encodings

578

12.3.4 Diffusion and Reverse Diffusion

582

12.3.5 The Generation Engine

584

12.3.6 Following Progress in the Training Process

589

12.4 Case Study: Image Generation

593

12.4.1 Loading Pretrained Models from Keras Hub

594

12.4.2 Loading Models Through Keras Hub

595

12.4.3 Using Stable Diffusion Models

596

12.4.4 Beyond Image Generation to More Complex Workflows

599

12.5 Summary

603

13 Recap of Key Concepts

605

13.1 Future Trends in Deep Learning

606

13.1.1 Advanced Architecture Trajectories

607

13.1.2 Reinforcement Learning Frontiers

608

13.1.3 Generative AI Revolution

609

13.2 Tips for Staying Updated with Advancements

611

13.2.1 Technical Skills Maintenance

611

13.2.2 Following Tutorials and Keras Codebase

612

13.2.3 Research Consumption Strategy

613

13.2.4 Community Engagement

614

13.3 Following the Latest Research

615

13.3.1 Technical Deep Dives

616

13.3.2 Practical Research Integration

617

13.3.3 Parting Words

618

The Author

619

Index

621