Inhaltsverzeichnis

Alle Kapitel aufklappen
Alle Kapitel zuklappen
Preface
13
Who Is This Book For?
13
Analyst with Some Coding Experience
13
Undergraduate Student Looking for Applied Experience
14
Nontechnical Leader of an Analytics Team
15
The Structure of This Book
16
Conclusion
18
1 Introduction
19
1.1 Aligning on Nomenclature
19
1.2 Learning to Google (or Prompt)
21
1.2.1 What Can You Find with Google?
21
1.2.2 Prompting
25
1.3 Predictions for Generative AI’s Impact on Machine Learning
26
1.4 Summary
26
2 Getting Started
27
2.1 GitHub
27
2.1.1 Creating an Account
28
2.1.2 GitHub in This Book
30
2.2 Anaconda
30
2.2.1 Creating an Account
31
2.2.2 Creating Projects and Uploading Data
34
2.2.3 Anaconda in This Book
37
2.3 Summary
38
3 Introduction to Our Use Cases
39
3.1 Importance of Understanding the Business Problem
39
3.1.1 Business Reviews
40
3.1.2 Definition of Success
40
3.2 Use Case 1: The Retail Tyrant
41
3.2.1 Details of the Request
41
3.2.2 History of the Request
42
3.2.3 Relationship with the Stakeholder
42
3.2.4 Use Case Questions
43
3.2.5 Use Case Answers
44
3.3 Use Case 2: Customer Retention
47
3.3.1 Details of the Request
47
3.3.2 History of the Request
47
3.3.3 Relationship with the Stakeholder
48
3.3.4 Use Case Questions
48
3.3.5 Use Case Answers
49
3.4 Use Case 3: Crime Predictions
50
3.4.1 Details of the Request
51
3.4.2 History of the Request
51
3.4.3 Relationship with the Stakeholder
51
3.4.4 Use Case Questions
52
3.4.5 Use Case Answers
52
3.5 Summary
53
4 Starting with the Data
55
4.1 Types of Data Sources
55
4.1.1 Manual
56
4.1.2 Automated
59
4.1.3 Data Sources for Our Use Cases
60
4.2 Data Exploration
66
4.2.1 Data Types
66
4.2.2 Data Visualization
77
4.2.3 Descriptive Statistics
105
4.2.4 Correlation Analysis
114
4.3 Data Cleaning (For Now)
120
4.3.1 Why Isn’t Data Already Clean?
121
4.3.2 Overview of Cleaning for Regression Models
122
4.3.3 Inaccurate Data
123
4.3.4 Missing Data
123
4.3.5 Dummy Coding
131
4.3.6 Dimensionality Reduction
161
4.4 Summary
178
5 Picking Your Model
181
5.1 The Simpler the Model, the Better
181
5.2 Model Decision Framework
183
5.2.1 How Important Is Interpretability?
184
5.2.2 How Many Rows and Columns?
184
5.2.3 What Is Being Predicted?
185
5.3 Train-Test Split
187
5.4 Regression Models
189
5.4.1 What Are Regression Models?
189
5.4.2 Multicollinearity
192
5.4.3 Linear Regression
192
5.4.4 Logistic Regression
211
5.5 Machine Learning Models
221
5.5.1 Decision Tree
222
5.5.2 Random Forest
252
5.5.3 Gradient Boosting Machine
271
5.6 Clustering
291
5.6.1 What Is Clustering?
292
5.6.2 Picking the Number of Clusters
294
5.6.3 Behind the Scenes of Clustering
296
5.7 Summary
297
6 Evaluating the Model and Iterating
299
6.1 Importance of Picking Validation Metrics
299
6.2 Validation Metrics
301
6.2.1 Accuracy
302
6.2.2 Confusion Matrix
302
6.2.3 Precision
305
6.2.4 Recall
305
6.2.5 F1 Score
305
6.2.6 Area Under the Curve
306
6.2.7 R-Squared
307
6.2.8 Mean Squared Error
309
6.2.9 Mean Absolute Error
309
6.2.10 Metric Summary
309
6.3 K-Fold Cross-Validation
311
6.4 Business Validations
311
6.4.1 Legal Considerations
312
6.4.2 Ethical Considerations
313
6.5 Machine Learning Interpretability
314
6.5.1 Regression Models
314
6.5.2 Tree-Based Models
316
6.6 Iterating on the Model
321
6.6.1 Feature Engineering
322
6.6.2 Remove Variables
324
6.6.3 Add New Data
325
6.7 Application to Use Cases
328
6.7.1 Use Case 1
328
6.7.2 Use Case 2
348
6.7.3 Use Case 3
362
6.8 Summary
374
7 Implementing, Monitoring, and Measuring the Model
375
7.1 Implementing Your Model for Predictions
375
7.1.1 Don’t Train the Model Each Time
376
7.1.2 Predictions for Our Use Cases
376
7.1.3 Saving Your Predictions
392
7.1.4 Practical Approaches to Consider
393
7.2 Model Monitoring
394
7.2.1 Importance of Model Monitoring
394
7.2.2 What to Monitor
395
7.2.3 Considerations for Model Monitoring
399
7.2.4 Retraining the Model
400
7.3 Measuring the Impact of Your Model
401
7.3.1 Business Sniff Test
401
7.3.2 Experiments
403
7.4 Summary
426
8 Closing Thoughts
427
8.1 Learning How to Learn with Generative AI
427
8.2 Learning How to Learn with Use Cases
428
8.3 Explore and Visualize Your Data
428
8.4 Cleaning Your Data and Dummy Coding
429
8.5 Machine Learning Models
430
8.6 Hyperparameters and Grid Search
430
8.7 Variable Lagging
431
8.8 The End
431
8.9 Acknowledgments
431
The Author
433
Index
435