07077nam 2200481 450 991048427550332120211020175007.01-4842-6809-1(CKB)4100000011867208(MiAaPQ)EBC6533449(Au-PeEL)EBL6533449(OCoLC)1244805565(CaSebORM)9781484268094(PPN)255294808(EXLCZ)99410000001186720820211020d2021 uy 0engurcnu||||||||txtrdacontentcrdamediacrrdacarrierDeep reinforcement learning with Python with PyTorch, TensorFlow and OpenAI Gym /Nimish Sanghi[Place of publication not identified] :APress,[2021]©20211 online resource (xix, 382 pages) illustrationsIncludes index.1-4842-6808-3 Intro -- Table of Contents -- About the Author -- About the Technical Reviewer -- Acknowledgments -- Introduction -- Chapter 1: Introduction to Reinforcement Learning -- Reinforcement Learning -- Machine Learning Branches -- Supervised Learning -- Unsupervised Learning -- Reinforcement Learning -- Core Elements -- Deep Learning with Reinforcement Learning -- Examples and Case Studies -- Autonomous Vehicles -- Robots -- Recommendation Systems -- Finance and Trading -- Healthcare -- Game Playing -- Libraries and Environment Setup -- Alternate Way to Install Local Environment -- Summary -- Chapter 2: Markov Decision Processes -- Definition of Reinforcement Learning -- Agent and Environment -- Rewards -- Markov Processes -- Markov Chains -- Markov Reward Processes -- Markov Decision Processes -- Policies and Value Functions -- Bellman Equations -- Optimality Bellman Equations -- Types of Solution Approaches with a Mind-Map -- Summary -- Chapter 3: Model-Based Algorithms -- OpenAI Gym -- Dynamic Programming -- Policy Evaluation/Prediction -- Policy Improvement and Iterations -- Value Iteration -- Generalized Policy Iteration -- Asynchronous Backups -- Summary -- Chapter 4: Model-Free Approaches -- Estimation/Prediction with Monte Carlo -- Bias and Variance of MC Predication Methods -- Control with Monte Carlo -- Off-Policy MC Control -- Temporal Difference Learning Methods -- Temporal Difference Control -- On-Policy SARSA -- Q-Learning: An Off-Policy TD Control -- Maximization Bias and Double Learning -- Expected SARSA Control -- Replay Buffer and Off-Policy Learning -- Q-Learning for Continuous State Spaces -- n-Step Returns -- Eligibility Traces and TD(λ) -- Relationships Between DP, MC, and TD -- Summary -- Chapter 5: Function Approximation -- Introduction -- Theory of Approximation -- Coarse Coding -- Tile Encoding -- Challenges in Approximation.Incremental Prediction: MC, TD, TD(λ) -- Incremental Control -- Semi-gradient N-step SARSA Control -- Semi-gradient SARSA(λ) Control -- Convergence in Functional Approximation -- Gradient Temporal Difference Learning -- Batch Methods (DQN) -- Linear Least Squares Method -- Deep Learning Libraries -- Summary -- Chapter 6: Deep Q-Learning -- Deep Q Networks -- Atari Game-Playing Agent Using DQN -- Prioritized Replay -- Double Q-Learning -- Dueling DQN -- NoisyNets DQN -- Categorical 51-Atom DQN (C51) -- Quantile Regression DQN -- Hindsight Experience Replay -- Summary -- Chapter 7: Policy Gradient Algorithms -- Introduction -- Pros and Cons of Policy-Based Methods -- Policy Representation -- Discrete Case -- Continuous Case -- Policy Gradient Derivation -- Objective Function -- Derivative Update Rule -- Intuition Behind the Update Rule -- REINFORCE Algorithm -- Variance Reduction with Reward to Go -- Further Variance Reduction with Baselines -- Actor-Critic Methods -- Defining Advantage -- Advantage Actor Critic -- Implementation of the A2C Algorithm -- Asynchronous Advantage Actor Critic -- Trust Region Policy Optimization Algorithm -- Proximal Policy Optimization Algorithm -- Summary -- Chapter 8: Combining Policy Gradient and Q-Learning -- Trade-Offs in Policy Gradient and Q-Learning -- General Framework to Combine Policy Gradient with Q-Learning -- Deep Deterministic Policy Gradient -- Q-Learning in DDPG (Critic) -- Policy Learning in DDPG (Actor) -- Pseudocode and Implementation -- Gym Environments Used in Code -- Code Listing -- Policy Network Actor (PyTorch) -- Policy Network Actor (TensorFlow) -- Q-Network Critic Implementation -- PyTorch -- TensorFlow -- Combined Model-Actor Critic Implementation -- Experience Replay -- Q-Loss Implementation -- PyTorch -- TensorFlow -- Policy Loss Implementation -- One Step Update Implementation.DDPG: Main Loop -- Twin Delayed DDPG -- Target-Policy Smoothing -- Q-Loss (Critic) -- Policy Loss (Actor) -- Delayed Update -- Pseudocode and Implementation -- Code Implementation -- Combined Model-Actor Critic Implementation -- Q-Loss Implementation -- Policy-Loss Implementation -- One-Step Update Implementation -- TD3 Main Loop -- Reparameterization Trick -- Score/Reinforce Way -- Reparameterization Trick and Pathwise Derivatives -- Experiment -- Entropy Explained -- Soft Actor Critic -- SAC vs. TD3 -- Q-Loss with Entropy-Regularization -- Policy Loss with Reparameterization Trick -- Pseudocode and Implementation -- Code Implementation -- Policy Network-Actor Implementation -- Q-Network, Combined Model, and Experience Replay -- Q-Loss and Policy-Loss Implementation -- One-Step Update and SAC Main Loop -- Summary -- Chapter 9: Integrated Planning and Learning -- Model-Based Reinforcement Learning -- Planning with a Learned Model -- Integrating Learning and Planning (Dyna) -- Dyna Q and Changing Environments -- Dyna Q+ -- Expected vs. Sample Updates -- Exploration vs. Exploitation -- Multi-arm Bandit -- Regret: Measure of Quality of Exploration -- Epsilon Greedy Exploration -- Upper Confidence Bound Exploration -- Thompson Sampling Exploration -- Comparing Different Exploration Strategies -- Planning at Decision Time and Monte Carlo Tree Search -- AlphaGo Walk-Through -- Summary -- Chapter 10: Further Exploration and Next Steps -- Model-Based RL: Additional Approaches -- World Models -- Imagination-Augmented Agents (I2A) -- Model-Based RL with Model-Free Fine-Tuning (MBMF) -- Model-Based Value Expansion (MBVE) -- Imitation Learning and Inverse Reinforcement Learning -- Derivative-Free Methods -- Transfer Learning and Multitask Learning -- Meta-Learning -- Popular RL Libraries -- How to Continue Studying -- Summary -- Index.Reinforcement learningPython (Computer program language)Reinforcement learning.Python (Computer program language)006.31Sanghi Nimish1230146MiAaPQMiAaPQMiAaPQBOOK9910484275503321Deep reinforcement learning with Python2855501UNINA