|
|
|
|
|
|
|
|
|
1. |
Record Nr. |
UNINA9910484275503321 |
|
|
Autore |
Sanghi Nimish |
|
|
Titolo |
Deep reinforcement learning with Python : with PyTorch, TensorFlow and OpenAI Gym / / Nimish Sanghi |
|
|
|
|
|
|
|
Pubbl/distr/stampa |
|
|
[Place of publication not identified] : , : APress, , [2021] |
|
©2021 |
|
|
|
|
|
|
|
|
|
ISBN |
|
|
|
|
|
|
Descrizione fisica |
|
1 online resource (xix, 382 pages) : illustrations |
|
|
|
|
|
|
Disciplina |
|
|
|
|
|
|
Soggetti |
|
Reinforcement learning |
Python (Computer program language) |
|
|
|
|
|
|
|
|
Lingua di pubblicazione |
|
|
|
|
|
|
Formato |
Materiale a stampa |
|
|
|
|
|
Livello bibliografico |
Monografia |
|
|
|
|
|
Note generali |
|
|
|
|
|
|
Nota di contenuto |
|
Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Introduction to Reinforcement Learning -- Reinforcement Learning -- Machine Learning Branches -- Supervised Learning -- Unsupervised Learning -- Reinforcement Learning -- Core Elements -- Deep Learning with Reinforcement Learning -- Examples and Case Studies -- Autonomous Vehicles -- Robots -- Recommendation Systems -- Finance and Trading -- Healthcare -- Game Playing -- Libraries and Environment Setup -- Alternate Way to Install Local Environment -- Summary -- Chapter 2: Markov Decision Processes -- Definition of Reinforcement Learning -- Agent and Environment -- Rewards -- Markov Processes -- Markov Chains -- Markov Reward Processes -- Markov Decision Processes -- Policies and Value Functions -- Bellman Equations -- Optimality Bellman Equations -- Types of Solution Approaches with a Mind-Map -- Summary -- Chapter 3: Model-Based Algorithms -- OpenAI Gym -- Dynamic Programming -- Policy Evaluation/Prediction -- Policy Improvement and Iterations -- Value Iteration -- Generalized Policy Iteration -- Asynchronous Backups -- Summary -- Chapter 4: Model-Free Approaches -- Estimation/Prediction with Monte Carlo -- Bias and Variance of MC Predication Methods -- Control with Monte Carlo -- Off-Policy MC Control -- Temporal Difference Learning Methods -- Temporal |
|
|
|
|
|
|
|
|
|
Difference Control -- On-Policy SARSA -- Q-Learning: An Off-Policy TD Control -- Maximization Bias and Double Learning -- Expected SARSA Control -- Replay Buffer and Off-Policy Learning -- Q-Learning for Continuous State Spaces -- n-Step Returns -- Eligibility Traces and TD(λ) -- Relationships Between DP, MC, and TD -- Summary -- Chapter 5: Function Approximation -- Introduction -- Theory of Approximation -- Coarse Coding -- Tile Encoding -- Challenges in Approximation. |
Incremental Prediction: MC, TD, TD(λ) -- Incremental Control -- Semi-gradient N-step SARSA Control -- Semi-gradient SARSA(λ) Control -- Convergence in Functional Approximation -- Gradient Temporal Difference Learning -- Batch Methods (DQN) -- Linear Least Squares Method -- Deep Learning Libraries -- Summary -- Chapter 6: Deep Q-Learning -- Deep Q Networks -- Atari Game-Playing Agent Using DQN -- Prioritized Replay -- Double Q-Learning -- Dueling DQN -- NoisyNets DQN -- Categorical 51-Atom DQN (C51) -- Quantile Regression DQN -- Hindsight Experience Replay -- Summary -- Chapter 7: Policy Gradient Algorithms -- Introduction -- Pros and Cons of Policy-Based Methods -- Policy Representation -- Discrete Case -- Continuous Case -- Policy Gradient Derivation -- Objective Function -- Derivative Update Rule -- Intuition Behind the Update Rule -- REINFORCE Algorithm -- Variance Reduction with Reward to Go -- Further Variance Reduction with Baselines -- Actor-Critic Methods -- Defining Advantage -- Advantage Actor Critic -- Implementation of the A2C Algorithm -- Asynchronous Advantage Actor Critic -- Trust Region Policy Optimization Algorithm -- Proximal Policy Optimization Algorithm -- Summary -- Chapter 8: Combining Policy Gradient and Q-Learning -- Trade-Offs in Policy Gradient and Q-Learning -- General Framework to Combine Policy Gradient with Q-Learning -- Deep Deterministic Policy Gradient -- Q-Learning in DDPG (Critic) -- Policy Learning in DDPG (Actor) -- Pseudocode and Implementation -- Gym Environments Used in Code -- Code Listing -- Policy Network Actor (PyTorch) -- Policy Network Actor (TensorFlow) -- Q-Network Critic Implementation -- PyTorch -- TensorFlow -- Combined Model-Actor Critic Implementation -- Experience Replay -- Q-Loss Implementation -- PyTorch -- TensorFlow -- Policy Loss Implementation -- One Step Update Implementation. |
DDPG: Main Loop -- Twin Delayed DDPG -- Target-Policy Smoothing -- Q-Loss (Critic) -- Policy Loss (Actor) -- Delayed Update -- Pseudocode and Implementation -- Code Implementation -- Combined Model-Actor Critic Implementation -- Q-Loss Implementation -- Policy-Loss Implementation -- One-Step Update Implementation -- TD3 Main Loop -- Reparameterization Trick -- Score/Reinforce Way -- Reparameterization Trick and Pathwise Derivatives -- Experiment -- Entropy Explained -- Soft Actor Critic -- SAC vs. TD3 -- Q-Loss with Entropy-Regularization -- Policy Loss with Reparameterization Trick -- Pseudocode and Implementation -- Code Implementation -- Policy Network-Actor Implementation -- Q-Network, Combined Model, and Experience Replay -- Q-Loss and Policy-Loss Implementation -- One-Step Update and SAC Main Loop -- Summary -- Chapter 9: Integrated Planning and Learning -- Model-Based Reinforcement Learning -- Planning with a Learned Model -- Integrating Learning and Planning (Dyna) -- Dyna Q and Changing Environments -- Dyna Q+ -- Expected vs. Sample Updates -- Exploration vs. Exploitation -- Multi-arm Bandit -- Regret: Measure of Quality of Exploration -- Epsilon Greedy Exploration -- Upper Confidence Bound Exploration -- Thompson Sampling Exploration -- Comparing Different Exploration Strategies -- Planning at Decision Time and Monte Carlo Tree Search -- |
|
|
|
|
|
|
|
|
AlphaGo Walk-Through -- Summary -- Chapter 10: Further Exploration and Next Steps -- Model-Based RL: Additional Approaches -- World Models -- Imagination-Augmented Agents (I2A) -- Model-Based RL with Model-Free Fine-Tuning (MBMF) -- Model-Based Value Expansion (MBVE) -- Imitation Learning and Inverse Reinforcement Learning -- Derivative-Free Methods -- Transfer Learning and Multitask Learning -- Meta-Learning -- Popular RL Libraries -- How to Continue Studying -- Summary -- Index. |
|
|
|
|
|
| |