LEADER 04483nam  2200505   450 
001 9910624394103321
005 20230421130137.0
010   $a9783031090301$b(electronic bk.)
010   $z9783031090295
035   $a(MiAaPQ)EBC7127529
035   $a(Au-PeEL)EBL7127529
035   $a(CKB)25208138500041
035   $a(PPN)26586173X
035   $a(EXLCZ)9925208138500041
100   $a20230315d2022      uy             0
101 0 $aeng
135   $aurcnu||||||||
181   $ctxt$2rdacontent
182   $cc$2rdamedia
183   $acr$2rdacarrier
200 10$aReinforcement learning from scratch $eunderstanding current approaches - with examples in Java and Greenfoot /$fUwe Lorenz
210  1$aCham, Switzerland :$cSpringer,$d[2022]
210  4$d©2022
215   $a1 online resource (195 pages)
311 08$aPrint version: Lorenz, Uwe Reinforcement Learning from Scratch Cham : Springer International Publishing AG,c2022 9783031090295 
320   $aIncludes bibliographical references.
327   $aIntro -- Preface -- Introduction -- Contents -- 1: Reinforcement Learning as a Subfield of Machine Learning -- 1.1  Machine Learning as Automated Processing of Feedback from the Environment -- 1.2  Machine Learning -- 1.3  Reinforcement Learning with Java -- Bibliography -- 2: Basic Concepts of Reinforcement Learning -- 2.1  Agents -- 2.2  The Policy of the Agent -- 2.3  Evaluation of States and Actions (Q-Function, Bellman Equation) -- Bibliography -- 3: Optimal Decision-Making in a Known Environment -- 3.1  Value Iteration -- 3.1.1  Target-Oriented Condition Assessment ("Backward Induction") -- 3.1.2  Policy-Based State Valuation (Reward Prediction) -- 3.2  Iterative Policy Search -- 3.2.1  Direct Policy Improvement -- 3.2.2  Mutual Improvement of Policy and Value Function -- 3.3  Optimal Policy in a Board Game Scenario -- 3.4  Summary -- Bibliography -- 4: Decision-Making and Learning in an Unknown Environment -- 4.1  Exploration vs. Exploitation -- 4.2  Retroactive Processing of Experience ("Model-Free Reinforcement Learning") -- 4.2.1  Goal-Oriented Learning ("Value-Based") -- Subsequent evaluation of complete episodes ("Monte Carlo" Method) -- Immediate Valuation Using the Temporal Difference (Q- and SARSA Algorithm) -- Consideration of the Action History (Eligibility Traces) -- 4.2.2  Policy Search -- Monte Carlo Tactics Search -- Evolutionary Strategies -- Monte Carlo Policy Gradient (REINFORCE) -- 4.2.3  Combined Methods (Actor-Critic) -- "Actor-Critic" Policy Gradients -- Technical Improvements to the Actor-Critic Architecture -- Feature Vectors and Partially Observable Environments -- 4.3  Exploration with Predictive Simulations ("Model-Based Reinforcement Learning") -- 4.3.1  Dyna-Q -- 4.3.2  Monte Carlo Rollout -- 4.3.3  Artificial Curiosity -- 4.3.4  Monte Carlo Tree Search (MCTS) -- 4.3.5  Remarks on the Concept of Intelligence.
327   $a4.4  Systematics of the Learning Methods -- Bibliography -- 5: Artificial Neural Networks as Estimators for State Values and the Action Selection -- 5.1  Artificial Neural Networks -- 5.1.1  Pattern Recognition with the Perceptron -- 5.1.2  The Adaptability of Artificial Neural Networks -- 5.1.3  Backpropagation Learning -- 5.1.4  Regression with Multilayer Perceptrons -- 5.2  State Evaluation with Generalizing Approximations -- 5.3  Neural Estimators for Action Selection -- 5.3.1  Policy Gradient with Neural Networks -- 5.3.2  Proximal Policy Optimization -- 5.3.3  Evolutionary Strategy with a Neural Policy -- Bibliography -- 6: Guiding Ideas in Artificial Intelligence over Time -- 6.1  Changing Guiding Ideas -- 6.2  On the Relationship Between Humans and Artificial Intelligence -- Bibliography.
606   $aJava (Computer program language)
606   $aReinforcement learning
606   $aJava (Llenguatge de programació)$2thub
606   $aAprenentatge per reforç (Intel·ligència artificial)$2thub
608   $aLlibres electrònics$2thub
615  0$aJava (Computer program language)
615  0$aReinforcement learning.
615  7$aJava (Llenguatge de programació)
615  7$aAprenentatge per reforç (Intel·ligència artificial)
676   $a005.133
700   $aLorenz$b Uwe$01264100
801  0$bMiAaPQ
801  1$bMiAaPQ
801  2$bMiAaPQ
912   $a9910624394103321
996   $aReinforcement Learning from Scratch$92963411
997   $aUNINA