Inhaltsverzeichnis

Alle Kapitel aufklappen
Alle Kapitel zuklappen
1 Introduction
17
1.1 Overview of Deep Learning
18
1.1.1 The Success of Deep Learning
18
1.1.2 The Two Pillars Supporting These Breakthroughs
21
1.1.3 Why Now?
22
1.2 Why Keras
23
1.3 The Structure of This Book
25
1.3.1 Text Boxes
27
1.4 How to Use This Book
28
1.4.1 Why Keras Installation Comes Later
29
1.4.2 Suggested Reading Strategy
30
2 Introduction to the Core of Machine Learning
33
2.1 What Is Machine Learning?
35
2.1.1 Helping the Machine Recognize Digits
35
2.1.2 The Outdated Method—Rule-Based Learning
36
2.1.3 Case Study in Machine Learning Code
38
2.1.4 Structuring the Learning Process
41
2.1.5 Analyzing the Machine’s Learning Process
46
2.1.6 Lessons Learned from the Code Case Study
48
2.2 Types of Machine Learning
49
2.2.1 Supervised Learning
50
2.2.2 Unsupervised Learning
59
2.3 The Magic Sauce: Reinforcement Learning
65
2.3.1 The Building Blocks of Reinforcement Learning
65
2.3.2 Key Applications for Reinforcement Learning
67
2.3.3 Challenges in Implementing Reinforcement Learning
68
2.4 Basics of Neural Networks
69
2.4.1 Core Components of Neural Networks
70
2.4.2 The Unintuitive Process of Learning
72
2.4.3 Clearing Up Some Misconceptions
73
2.5 Setting Up Your Environment
73
2.6 Summary
78
3 Fundamentals of Gradient Descent
79
3.1 Understanding Gradient Descent
80
3.1.1 The Basic Setup
80
3.1.2 Formalizing the Process: Training Phase
82
3.1.3 From Training to Deployment: Making It Work in the Real World
84
3.1.4 The Process of Learning
85
3.1.5 Finding the Best Parameters: Minimizing the Loss
89
3.1.6 Lifting the Assumptions of a Single Input Feature
94
3.1.7 Higher Dimensions in Gradient Descent
99
3.2 Types of Gradient Descent: Batch, Stochastic, Mini-Batch
101
3.2.1 Contour Plots for Visualization of Gradient Descent
102
3.2.2 Improving the Efficiency of Batch Gradient Descent
103
3.2.3 A Paradox in the Use of Stochastic Gradient Descent
105
3.3 Learning Rate and Optimization
107
3.4 Implementing Gradient Descent in Code
110
3.4.1 Gradient Descent from Scratch
110
3.4.2 Gradient Descent Using Keras
113
3.5 Summary
116
4 Classification Through Gradient Descent
117
4.1 Classification Basics
118
4.1.1 Classification Problem Setup
119
4.1.2 First Attempt Using Gradient Descent
121
4.1.3 Second Attempt: Fixing the Issues in the First Attempt
122
4.1.4 Third Attempt: Fixing the Loss Function
128
4.1.5 Squishing Functions and Decision Boundaries
131
4.1.6 Learning Process Summary
134
4.2 Nonlinear Relationships and Neural Networks
136
4.2.1 Feature Transformations
137
4.2.2 The Kernel Trick, Logistic Regression, and All of Machine Learning
140
4.3 Binary vs. Multi-Class Classification
147
4.3.1 The One-vs-All Approach
147
4.3.2 The Softmax Classifier
152
4.4 Loss Functions: Cross-Entropy
155
4.4.1 Categorical Cross-Entropy
156
4.4.2 Pitfall to Avoid When Using Cross Entropies
157
4.4.3 Sparse Categorical Cross-Entropy
158
4.5 Building a Classifier with Gradient Descent
161
4.5.1 Layers in Code
161
4.5.2 Choice of Loss Function
163
4.5.3 Parameter Counts
164
4.5.4 The Density of Keras Code
165
4.6 Summary
166
5 Deep Dive into Keras
167
5.1 Introduction to Keras Framework
168
5.1.1 The Philosophy Behind Keras: Making AI Human-Friendly
169
5.1.2 Evolution Through Adaptability
169
5.1.3 Keras 3.0: The Multi-Engine Framework
171
5.1.4 Key Strengths of Keras
172
5.1.5 Keras in the Real World
173
5.2 Setting Up Keras
174
5.2.1 Setting Up Python
175
5.2.2 TensorFlow Installation and Points to Keep in Mind
177
5.2.3 Setting Up CUDA for GPU Acceleration
180
5.2.4 Installing Keras
185
5.2.5 Using a GPU in Google Collaboratory
186
5.3 Building Your First Model
188
5.3.1 Why NumPy Matters for Machine Learning
188
5.3.2 Symbolic Computation: The Magic Behind Neural Networks
200
5.4 Implementing Core Concepts in Keras: Gradient Descent and Classification
205
5.4.1 The Building Blocks of a Cat-Dog Classifier
205
5.4.2 Getting and Fixing the Data
206
5.4.3 Performance Optimization and Model Specification
210
5.4.4 Evaluation Metrics
212
5.4.5 Guiding the Training Through Callbacks and Checkpoints
216
5.4.6 Evaluating the Model
218
5.5 Summary
222
6 Regularization Techniques
223
6.1 An Overview of Overfitting and Underfitting: Do You Need More Data?
224
6.1.1 From Lines to Curves: Adding Polynomial Features
225
6.1.2 Using Increasingly Complex Models
228
6.1.3 The Balance of Complexity
229
6.1.4 Regularization Term
234
6.1.5 Adjusting the Complexity Knob
237
6.1.6 Do I Need More Data
240
6.1.7 Reporting the Final Results: Validation Set
241
6.2 Dropout: Concept and Implementation
243
6.2.1 The Problem: Co-Adaptation and Overfitting
243
6.2.2 The Road to Memorization
244
6.2.3 The Ensemble Intuition Behind Dropout
245
6.2.4 Dropout Mechanics
246
6.2.5 Finding the Sweet Spot: Dropout Rates in Practice
247
6.2.6 Implementing Dropout in Pure Python
248
6.2.7 Common Pitfalls and Debugging Tips
250
6.3 Other Regularization Methods: L1 and L2 Regularization
251
6.3.1 L1 Regularization (Lasso)
252
6.3.2 Elastic Net: Combining L1 and L2
252
6.3.3 When Not to Use Regularization
253
6.3.4 Practical Considerations: Dropout vs. L1/L2 in Neural Networks
254
6.4 Applying Regularization in Keras
254
6.4.1 Implementing L2 Regularization in Keras
255
6.4.2 Dropout in Keras
255
6.4.3 Beyond Basic Dropout: Specialized Variants
257
6.4.4 Finding the Perfect Dropout Rate: A Systematic Approach
259
6.5 Summary
264
7 Convolutional Neural Networks
265
7.1 Introduction to Convolutional Neural Networks
266
7.1.1 The Limitation of Fully Connected Networks
266
7.1.2 The Learning Standstill
269
7.1.3 Solving the Vanishing Gradient Problem
270
7.1.4 Dense vs. Sparse Connections
272
7.1.5 From Convolution to Neural Networks: The Conv2D Layer
279
7.2 Convolutional Layers, Pooling Layers and Fully Connected Layers
287
7.2.1 Core Implementation of a Convolutional Layer
288
7.2.2 The Hidden Superpowers of Convolutional Layers
291
7.2.3 Pooling Layers: The Image Simplifiers
293
7.2.4 Global Pooling
295
7.2.5 Bringing It All Together: Fully Connected Layers in CNNs
299
7.3 Implementing CNNs with Keras
301
7.3.1 Conv2D vs. Conv1D
301
7.3.2 The Opposite of Convolution: Deconvolution Layers
301
7.4 The “Shapes” Problem
303
7.5 Case Study: Image Classification
307
7.6 Summary
313
8 Exploring the Keras Functional API
315
8.1 Overview of Keras Functional API
316
8.1.1 The Information Bottleneck
317
8.1.2 Networks as Directed Acyclic Graphs
318
8.1.3 The Functional Programming Heritage
319
8.1.4 Key Advantages of the Functional API
320
8.2 Building Complex Models with the Functional API
323
8.2.1 Overview of the Functional API Syntax
323
8.2.2 Creating Models with the Functional API
324
8.2.3 Best Practices for Complex Models
327
8.2.4 Handling Multiple Inputs and Outputs
328
8.2.5 Example: Building an Image Captioning Model
331
8.2.6 Residual Connections
332
8.2.7 Branching Architectures
336
8.3 Use Cases and Examples
340
8.3.1 Image Classification with ResNet
341
8.3.2 Siamese Networks for Similarity Learning
346
8.3.3 U-Net for Image Segmentation
354
8.4 Using Transfer Learning to Customize Models for Your Organization
364
8.4.1 The Why of Transfer Learning
365
8.4.2 Leveraging Pretrained Models from Keras
366
8.4.3 The Process of Transfer Learning
368
8.4.4 Reloading an Existing Model
368
8.5 Summary
373
9 Understanding Transformers
375
9.1 The Theory Behind Transformers
376
9.1.1 A Simple Time Series Example
377
9.1.2 From Numbers to Words: The Challenge of Text Data
382
9.1.3 GloVe: Learning the Language of Vectors
383
9.1.4 A Gentle Introduction to Attention
388
9.1.5 Why Transformers Revolutionized Natural Language Processing and Beyond
391
9.2 Components: Attention Mechanism, Encoder, Decoder
393
9.2.1 The Conversation Between Words
394
9.2.2 Why Position Information Matters
398
9.2.3 Encoder Structure: The Information Processing Powerhouse
401
9.2.4 Decoder Structure: Creating New Sequences from Understanding
403
9.3 Implementing Transformers in Keras
406
9.3.1 The Encoder Block
407
9.3.2 The Decoder Block
413
9.3.3 The Transformer: Putting the Encoder and Decoder Together
415
9.4 Case Study: Large Language Model Chatbot
418
9.4.1 Structure of Modern Keras Transformer Models
418
9.4.2 Working with Pretrained Models from Kaggle Hub
422
9.5 Summary
427
10 Reinforcement Learning: The Secret Sauce
429
10.1 Introduction to Reinforcement Learning
430
10.1.1 The Problem of Learning by Doing
430
10.1.2 Brief History and Major Breakthroughs
433
10.1.3 Real-World Applications
434
10.1.4 Challenges Unique to Reinforcement Learning
436
10.2 Key Concepts: Agents, Environments, Rewards
438
10.2.1 Structure of the Reinforcement Learning Framework
438
10.2.2 Environment Design and State Representation
440
10.2.3 Understanding Agents and Policy Functions
442
10.2.4 Reward Engineering and Signal Design
443
10.2.5 The Exploration vs. Exploitation Dilemma
445
10.3 Popular Algorithms: Q-Learning, Policy Gradients, and Deep Q-Networks
447
10.3.1 The Markov Decision Processes
447
10.3.2 Value Functions and Q-Tables
449
10.3.3 Building the Q-Table
451
10.3.4 Q-Learning Algorithm: A Worked Example
453
10.3.5 Q-Learning and Associated Issues
458
10.3.6 The Limits of Tabular Q-Learning
460
10.4 Implementing Reinforcement Learning Models in Keras
464
10.4.1 Our First Reinforcement Learning Environment
465
10.4.2 Implementing the Deep Q-Network Algorithm with Keras
473
10.4.3 Experience Replay and Target Networks: The Foundations of Stable Deep Reinforcement Learning
482
10.5 Reinforcement Learning in Large Language Models
486
10.5.1 The Fundamental Challenge: Moving Beyond Prediction
487
10.5.2 Challenges and Limitations
489
10.5.3 Future Directions and Emerging Approaches
491
10.6 Summary
493
11 Autoencoders and Generative AI
495
11.1 Introduction to Autoencoders
496
11.1.1 What Are Autoencoders?
497
11.1.2 Autoencoder Architecture Deep Dive
500
11.1.3 Building Your First Autoencoder in Keras
505
11.1.4 Types of Autoencoders
514
11.2 Variational Autoencoders
519
11.2.1 Navigating the Space with Uncertainty
520
11.2.2 Mathematical Framework of Variational Autoencoders
521
11.2.3 Variational Autoencoder Implementation in Keras
525
11.3 Generative Adversarial Networks
535
11.3.1 The Adversarial Game
535
11.3.2 Generative Adversarial Network Architecture
537
11.3.3 Other Variations
541
11.3.4 Generative Adversarial Network Implementation in Keras
543
11.3.5 Implementation Challenges
551
11.4 Summary
552
12 Advanced Generative AI: Stable Diffusion
553
12.1 Theory Behind Stable Diffusion
554
12.1.1 From Previous Generative Models to Diffusion
555
12.1.2 Diffusion Process Fundamentals
557
12.1.3 Reverse Diffusion: Learning to Denoise
558
12.1.4 Connections to Physical Processes
559
12.1.5 Denoising Diffusion Probabilistic Models
559
12.1.6 Latent Diffusion and Stable Diffusion Architecture
562
12.1.7 Cross-Attention: The Bridge Between Text and Images
564
12.2 How Stable Diffusion Uses Core Concepts
565
12.2.1 Efficient Diffusion Through Learned Representations
566
12.2.2 Advanced Attention Mechanisms
567
12.2.3 Training Strategies and Optimization
570
12.3 Implementing Stable Diffusion Models
572
12.3.1 Environment and Data Prep
573
12.3.2 Setting Up an Evaluation Measure
574
12.3.3 Model Description and Time-Step Encodings
578
12.3.4 Diffusion and Reverse Diffusion
582
12.3.5 The Generation Engine
584
12.3.6 Following Progress in the Training Process
589
12.4 Case Study: Image Generation
593
12.4.1 Loading Pretrained Models from Keras Hub
594
12.4.2 Loading Models Through Keras Hub
595
12.4.3 Using Stable Diffusion Models
596
12.4.4 Beyond Image Generation to More Complex Workflows
599
12.5 Summary
603
13 Recap of Key Concepts
605
13.1 Future Trends in Deep Learning
606
13.1.1 Advanced Architecture Trajectories
607
13.1.2 Reinforcement Learning Frontiers
608
13.1.3 Generative AI Revolution
609
13.2 Tips for Staying Updated with Advancements
611
13.2.1 Technical Skills Maintenance
611
13.2.2 Following Tutorials and Keras Codebase
612
13.2.3 Research Consumption Strategy
613
13.2.4 Community Engagement
614
13.3 Following the Latest Research
615
13.3.1 Technical Deep Dives
616
13.3.2 Practical Research Integration
617
13.3.3 Parting Words
618
The Author
619
Index
621