Share Catalogue

Storico ricerche

Pubblicazioni (Istanze)

Vai a Persone/Opere

Home / (Tutto) >> Aprenentatge per reforç (Intel·ligència artificial)

Info

Utilizzare la checkbox di selezione a fianco di ciascun documento per attivare le funzionalità di stampa, invio email, download nei formati disponibili del (i) record.

Info

Utilizzare questo link per rimuovere la selezione effettuata.

Export / Download (0)

Esporta in PDF
Esporta in Excel
Esporta in HTML
Esporta in MARC (binario)
Esporta in MARC XML
Esporta in MARC (testo)
Invia tramite E-Mail

Biblioteca

Univ. Federico II (4)
Univ. di Salerno (3)

Tutto
+

MARC Lista (tabellare)

Seleziona tutti

Generic multi-agent reinforcement learning approach for flexible job-shop scheduling / / Schirin Bär

Bär Schirin

Wiesbaden : , : Springer Vieweg, , [2022]

Materiale a stampa

Lo trovi qui: Univ. Federico II

Opac:

Controlla la disponibilità qui

Generic multi-agent reinforcement learning approach for flexible job-shop scheduling / / Schirin Bär

Bär Schirin

Wiesbaden : , : Springer Vieweg, , [2022]

Materiale a stampa

Lo trovi qui: Univ. di Salerno

Opac:

Controlla la disponibilità qui

Handbook of reinforcement learning and control / / Kyriakos G. Vamvoudakis [and three others], editors

Cham, Switzerland : , : Springer, , [2021]

Materiale a stampa

Lo trovi qui: Univ. Federico II

Opac:

Controlla la disponibilità qui

Reinforcement learning from scratch : understanding current approaches - with examples in Java and Greenfoot / / Uwe Lorenz

Lorenz Uwe

Cham, Switzerland : , : Springer, , [2022]

Materiale a stampa

Lo trovi qui: Univ. di Salerno

Opac:

Controlla la disponibilità qui

Reinforcement learning from scratch : understanding current approaches - with examples in Java and Greenfoot / / Uwe Lorenz

Lorenz Uwe

Cham, Switzerland : , : Springer, , [2022]

Materiale a stampa

Lo trovi qui: Univ. Federico II

Opac:

Controlla la disponibilità qui

Reinforcement learning with hybrid quantum approximation in the NISQ context / / Leonhard Kunczik

Kunczik Leonhard

Wiesbaden, Germany : , : Springer Vieweg, , [2022]

Materiale a stampa

Lo trovi qui: Univ. di Salerno

Opac:

Controlla la disponibilità qui

Reinforcement learning with hybrid quantum approximation in the NISQ context / / Leonhard Kunczik

Kunczik Leonhard

Wiesbaden, Germany : , : Springer Vieweg, , [2022]

Materiale a stampa

Lo trovi qui: Univ. Federico II

Opac:

Controlla la disponibilità qui

Formato

Materiale a stampa (7)

Livello bibliografico

Monografie (7)

Autore (Persona)

Autore (Ente)

Autore (Convegno)

Opere

Pubbl/distr/stampa

Springer Vieweg (4)
Springer (3)

Lingua di pubblicazione

Inglese (7)

Data

Data di pubblicazione

2022 (6)
2021 (1)

Soggetto (Persona)

Soggetto (Ente)

Soggetto (Convegno)

Soggetto geografico

Soggetto topico

Altro...

Soggetto genere / forma

Llibres electrònics (7)

Autore	Bär Schirin
Pubbl/distr/stampa	Wiesbaden : , : Springer Vieweg, , [2022]
Descrizione fisica	1 online resource (163 pages)
Disciplina	670.285
Soggetto topico	Flexible manufacturing systems Reinforcement learning Aprenentatge per reforç (Intel·ligència artificial) Sistemes multiagent Sistemes de producció flexibles
Soggetto genere / forma	Llibres electrònics
ISBN	9783658391799 9783658391782
Formato	Materiale a stampa
Livello bibliografico	Monografia
Lingua di pubblicazione	eng
Nota di contenuto	Intro -- Danksagung -- Abstract -- Zusammenfassung -- Contents -- Abbreviations -- List of Figures -- List of Tables -- 1 Introduction -- 1.1 Research Goals -- 1.2 Methodology -- 1.3 Structure of the Thesis -- 2 Requirements for Production Scheduling in Flexible Manufacturing -- 2.1 Foundations of Flexible Job-Shop Scheduling Problems -- 2.2 Requirement Analysis of Flexible Scheduling Solutions -- 2.2.1 Influences on Warehouse Control Systems -- 2.2.2 Influences on Manufacturing Control Systems -- 2.2.3 Derived and Ranked Requirements -- 2.3 State of the Art: Approaches to Solve Job-Shop Scheduling Problems -- 2.3.1 Conventional Scheduling Solutions -- 2.3.2 Reinforcement Learning Scheduling Solutions -- 2.4 Identification of the Research Gap -- 2.5 Contribution of this Work: Extended Flexible Job-Shop Scheduling Problem -- 3 Reinforcement Learning as an Approach for Flexible Scheduling -- 3.1 Understanding the Foundations: Formalization as a Markov Decision Process -- 3.1.1 Agent-Environment Interaction -- 3.1.2 Policies and Value Functions -- 3.1.3 Challenges Arising in Reinforcement Learning -- 3.2 Deep Q-Learning -- 3.2.1 Temporal Difference Learning and Q-Learning -- 3.2.2 Deep Q-Network -- 3.2.3 Loss Optimization -- 3.3 State of the Art: Cooperating Agents to Solve Complex Problems -- 3.3.1 Multi-Agent Learning Methods -- 3.3.2 Learning in Cooperative Multi-Agent RL Setups -- 3.4 Summary -- 4 Concept for Multi-Resources Flexible Job-Shop Scheduling -- 4.1 Concept for Agent-based Scheduling in FMS -- 4.1.1 Overall Concept -- 4.1.2 Job Specification -- 4.1.3 Petri Net Simulation -- 4.2 Formalization as a Markov Decision Process -- 4.2.1 Action Designs -- 4.2.2 State Designs -- 4.2.3 Reward Design -- 4.3 Considered Flexible Manufacturing System -- 4.4 Evaluation of the Technical Functionalities -- 4.5 Summary. 5 Multi-Agent Approach for Reactive Scheduling in Flexible Manufacturing -- 5.1 Training Set-up -- 5.2 Specification of the Reward Design -- 5.3 Evaluation of Suitable Training Strategies -- 5.3.1 Evaluation of MARL Algorithms -- 5.3.2 Selection of MARL Learning Methods -- 5.3.3 Evaluation of Parameter Sharing and Centralized Learning -- 5.4 Training Approach to Present Situations -- 5.5 Summary -- 6 Empirical Evaluation of the Requirements -- 6.1 Generalization to Various Products and Machines -- 6.2 Achieving the Global Objective -- 6.2.1 Comparison of Dense and Sparse Global Rewards -- 6.2.2 Cooperative Behavior -- 6.3 Benchmarking against Heuristic Search Algorithms -- 6.3.1 Evaluation for Unknown and Known Situations -- 6.3.2 Evaluation of Real-time Decision-Making -- 6.4 Consolidated Requirements Evaluation -- 6.5 Summary -- 7 Integration into a Flexible Manufacturing System -- 7.1 Acceptance Criteria for the Integration Concept -- 7.2 Integration Concept of MARL Scheduling Solution -- 7.2.1 Integration in the MES -- 7.2.2 Information Exchange -- 7.3 Design Cycle -- 7.3.1 Functioning Scheduling -- 7.3.2 Efficient Production Flow -- 7.3.3 Handling of Unforeseen Events -- 7.3.4 Handling of New Machine Skills -- 7.3.5 Handling of New Machines -- 7.4 Summary -- 8 Critical Discussion and Outlook -- 9 Summary -- 1 Bibliography.
Record Nr.	UNINA-9910616202703321

Autore	Bär Schirin
Pubbl/distr/stampa	Wiesbaden : , : Springer Vieweg, , [2022]
Descrizione fisica	1 online resource (163 pages)
Disciplina	670.285
Soggetto topico	Flexible manufacturing systems Reinforcement learning Aprenentatge per reforç (Intel·ligència artificial) Sistemes multiagent Sistemes de producció flexibles
Soggetto genere / forma	Llibres electrònics
ISBN	9783658391799 9783658391782
Formato	Materiale a stampa
Livello bibliografico	Monografia
Lingua di pubblicazione	eng
Nota di contenuto	Intro -- Danksagung -- Abstract -- Zusammenfassung -- Contents -- Abbreviations -- List of Figures -- List of Tables -- 1 Introduction -- 1.1 Research Goals -- 1.2 Methodology -- 1.3 Structure of the Thesis -- 2 Requirements for Production Scheduling in Flexible Manufacturing -- 2.1 Foundations of Flexible Job-Shop Scheduling Problems -- 2.2 Requirement Analysis of Flexible Scheduling Solutions -- 2.2.1 Influences on Warehouse Control Systems -- 2.2.2 Influences on Manufacturing Control Systems -- 2.2.3 Derived and Ranked Requirements -- 2.3 State of the Art: Approaches to Solve Job-Shop Scheduling Problems -- 2.3.1 Conventional Scheduling Solutions -- 2.3.2 Reinforcement Learning Scheduling Solutions -- 2.4 Identification of the Research Gap -- 2.5 Contribution of this Work: Extended Flexible Job-Shop Scheduling Problem -- 3 Reinforcement Learning as an Approach for Flexible Scheduling -- 3.1 Understanding the Foundations: Formalization as a Markov Decision Process -- 3.1.1 Agent-Environment Interaction -- 3.1.2 Policies and Value Functions -- 3.1.3 Challenges Arising in Reinforcement Learning -- 3.2 Deep Q-Learning -- 3.2.1 Temporal Difference Learning and Q-Learning -- 3.2.2 Deep Q-Network -- 3.2.3 Loss Optimization -- 3.3 State of the Art: Cooperating Agents to Solve Complex Problems -- 3.3.1 Multi-Agent Learning Methods -- 3.3.2 Learning in Cooperative Multi-Agent RL Setups -- 3.4 Summary -- 4 Concept for Multi-Resources Flexible Job-Shop Scheduling -- 4.1 Concept for Agent-based Scheduling in FMS -- 4.1.1 Overall Concept -- 4.1.2 Job Specification -- 4.1.3 Petri Net Simulation -- 4.2 Formalization as a Markov Decision Process -- 4.2.1 Action Designs -- 4.2.2 State Designs -- 4.2.3 Reward Design -- 4.3 Considered Flexible Manufacturing System -- 4.4 Evaluation of the Technical Functionalities -- 4.5 Summary. 5 Multi-Agent Approach for Reactive Scheduling in Flexible Manufacturing -- 5.1 Training Set-up -- 5.2 Specification of the Reward Design -- 5.3 Evaluation of Suitable Training Strategies -- 5.3.1 Evaluation of MARL Algorithms -- 5.3.2 Selection of MARL Learning Methods -- 5.3.3 Evaluation of Parameter Sharing and Centralized Learning -- 5.4 Training Approach to Present Situations -- 5.5 Summary -- 6 Empirical Evaluation of the Requirements -- 6.1 Generalization to Various Products and Machines -- 6.2 Achieving the Global Objective -- 6.2.1 Comparison of Dense and Sparse Global Rewards -- 6.2.2 Cooperative Behavior -- 6.3 Benchmarking against Heuristic Search Algorithms -- 6.3.1 Evaluation for Unknown and Known Situations -- 6.3.2 Evaluation of Real-time Decision-Making -- 6.4 Consolidated Requirements Evaluation -- 6.5 Summary -- 7 Integration into a Flexible Manufacturing System -- 7.1 Acceptance Criteria for the Integration Concept -- 7.2 Integration Concept of MARL Scheduling Solution -- 7.2.1 Integration in the MES -- 7.2.2 Information Exchange -- 7.3 Design Cycle -- 7.3.1 Functioning Scheduling -- 7.3.2 Efficient Production Flow -- 7.3.3 Handling of Unforeseen Events -- 7.3.4 Handling of New Machine Skills -- 7.3.5 Handling of New Machines -- 7.4 Summary -- 8 Critical Discussion and Outlook -- 9 Summary -- 1 Bibliography.
Record Nr.	UNISA-996495171403316

Pubbl/distr/stampa	Cham, Switzerland : , : Springer, , [2021]
Descrizione fisica	1 online resource (839 pages)
Disciplina	006.31
Collana	Studies in Systems, Decision and Control
Soggetto topico	Reinforcement learning Automatic control - Sensitivity Aprenentatge per reforç (Intel·ligència artificial) Control automàtic
Soggetto genere / forma	Llibres electrònics
ISBN	3-030-60990-1
Formato	Materiale a stampa
Livello bibliografico	Monografia
Lingua di pubblicazione	eng
Nota di contenuto	Intro -- Preface -- Contents -- Part ITheory of Reinforcement Learning for Model-Free and Model-Based Control and Games -- 1 What May Lie Ahead in Reinforcement Learning -- References -- 2 Reinforcement Learning for Distributed Control and Multi-player Games -- 2.1 Introduction -- 2.2 Optimal Control of Continuous-Time Systems -- 2.2.1 IRL with Experience Replay Learning Technique ch2Modares2014Automatica,ch2Kamalapurkar2016 -- 2.2.2 mathcalHinfty Control of CT Systems -- 2.3 Nash Games -- 2.4 Graphical Games -- 2.4.1 Off-Policy RL for Graphical Games -- 2.5 Output Synchronization of Multi-agent Systems -- 2.6 Conclusion and Open Research Directions -- References -- 3 From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions -- 3.1 Introduction -- 3.2 The Communities of Sequential Decisions -- 3.3 Stochastic Optimal Control Versus Reinforcement Learning -- 3.3.1 Stochastic Control -- 3.3.2 Reinforcement Learning -- 3.3.3 A Critique of the MDP Modeling Framework -- 3.3.4 Bridging Optimal Control and Reinforcement Learning -- 3.4 The Universal Modeling Framework -- 3.4.1 Dimensions of a Sequential Decision Model -- 3.4.2 State Variables -- 3.4.3 Objective Functions -- 3.4.4 Notes -- 3.5 Energy Storage Illustration -- 3.5.1 A Basic Energy Storage Problem -- 3.5.2 With a Time-Series Price Model -- 3.5.3 With Passive Learning -- 3.5.4 With Active Learning -- 3.5.5 With Rolling Forecasts -- 3.5.6 Remarks -- 3.6 Designing Policies -- 3.6.1 Policy Search -- 3.6.2 Lookahead Approximations -- 3.6.3 Hybrid Policies -- 3.6.4 Remarks -- 3.6.5 Stochastic Control, Reinforcement Learning, and the Four Classes of Policies -- 3.7 Policies for Energy Storage -- 3.8 Extension to Multi-agent Systems -- 3.9 Observations -- References -- 4 Fundamental Design Principles for Reinforcement Learning Algorithms -- 4.1 Introduction. 4.1.1 Stochastic Approximation and Reinforcement Learning -- 4.1.2 Sample Complexity Bounds -- 4.1.3 What Will You Find in This Chapter? -- 4.1.4 Literature Survey -- 4.2 Stochastic Approximation: New and Old Tricks -- 4.2.1 What is Stochastic Approximation? -- 4.2.2 Stochastic Approximation and Learning -- 4.2.3 Stability and Convergence -- 4.2.4 Zap-Stochastic Approximation -- 4.2.5 Rates of Convergence -- 4.2.6 Optimal Convergence Rate -- 4.2.7 TD and LSTD Algorithms -- 4.3 Zap Q-Learning: Fastest Convergent Q-Learning -- 4.3.1 Markov Decision Processes -- 4.3.2 Value Functions and the Bellman Equation -- 4.3.3 Q-Learning -- 4.3.4 Tabular Q-Learning -- 4.3.5 Convergence and Rate of Convergence -- 4.3.6 Zap Q-Learning -- 4.4 Numerical Results -- 4.4.1 Finite State-Action MDP -- 4.4.2 Optimal Stopping in Finance -- 4.5 Zap-Q with Nonlinear Function Approximation -- 4.5.1 Choosing the Eligibility Vectors -- 4.5.2 Theory and Challenges -- 4.5.3 Regularized Zap-Q -- 4.6 Conclusions and Future Work -- References -- 5 Mixed Density Methods for Approximate Dynamic Programming -- 5.1 Introduction -- 5.2 Unconstrained Affine-Quadratic Regulator -- 5.3 Regional Model-Based Reinforcement Learning -- 5.3.1 Preliminaries -- 5.3.2 Regional Value Function Approximation -- 5.3.3 Bellman Error -- 5.3.4 Actor and Critic Update Laws -- 5.3.5 Stability Analysis -- 5.3.6 Summary -- 5.4 Local (State-Following) Model-Based Reinforcement Learning -- 5.4.1 StaF Kernel Functions -- 5.4.2 Local Value Function Approximation -- 5.4.3 Actor and Critic Update Laws -- 5.4.4 Analysis -- 5.4.5 Stability Analysis -- 5.4.6 Summary -- 5.5 Combining Regional and Local State-Following Approximations -- 5.6 Reinforcement Learning with Sparse Bellman Error Extrapolation -- 5.7 Conclusion -- References -- 6 Model-Free Linear Quadratic Regulator. 6.1 Introduction to a Model-Free LQR Problem -- 6.2 A Gradient-Based Random Search Method -- 6.3 Main Results -- 6.4 Proof Sketch -- 6.4.1 Controlling the Bias -- 6.4.2 Correlation of "0362 f(K) and f(K) -- 6.5 An Example -- 6.6 Thoughts and Outlook -- References -- Part IIConstraint-Driven and Verified RL -- 7 Adaptive Dynamic Programming in the Hamiltonian-Driven Framework -- 7.1 Introduction -- 7.1.1 Literature Review -- 7.1.2 Motivation -- 7.1.3 Structure -- 7.2 Problem Statement -- 7.3 Hamiltonian-Driven Framework -- 7.3.1 Policy Evaluation -- 7.3.2 Policy Comparison -- 7.3.3 Policy Improvement -- 7.4 Discussions on the Hamiltonian-Driven ADP -- 7.4.1 Implementation with Critic-Only Structure -- 7.4.2 Connection to Temporal Difference Learning -- 7.4.3 Connection to Value Gradient Learning -- 7.5 Simulation Study -- 7.6 Conclusion -- References -- 8 Reinforcement Learning for Optimal Adaptive Control of Time Delay Systems -- 8.1 Introduction -- 8.2 Problem Description -- 8.3 Extended State Augmentation -- 8.4 State Feedback Q-Learning Control of Time Delay Systems -- 8.5 Output Feedback Q-Learning Control of Time Delay Systems -- 8.6 Simulation Results -- 8.7 Conclusions -- References -- 9 Optimal Adaptive Control of Partially Uncertain Linear Continuous-Time Systems with State Delay -- 9.1 Introduction -- 9.2 Problem Statement -- 9.3 Linear Quadratic Regulator Design -- 9.3.1 Periodic Sampled Feedback -- 9.3.2 Event Sampled Feedback -- 9.4 Optimal Adaptive Control -- 9.4.1 Periodic Sampled Feedback -- 9.4.2 Event Sampled Feedback -- 9.4.3 Hybrid Reinforcement Learning Scheme -- 9.5 Perspectives on Controller Design with Image Feedback -- 9.6 Simulation Results -- 9.6.1 Linear Quadratic Regulator with Known Internal Dynamics -- 9.6.2 Optimal Adaptive Control with Unknown Drift Dynamics -- 9.7 Conclusion -- References. 10 Dissipativity-Based Verification for Autonomous Systems in Adversarial Environments -- 10.1 Introduction -- 10.1.1 Related Work -- 10.1.2 Contributions -- 10.1.3 Structure -- 10.1.4 Notation -- 10.2 Problem Formulation -- 10.2.1 (Q,S,R)-Dissipative and L2-Gain Stable Systems -- 10.3 Learning-Based Distributed Cascade Interconnection -- 10.4 Learning-Based L2-Gain Composition -- 10.4.1 Q-Learning for L2-Gain Verification -- 10.4.2 L2-Gain Model-Free Composition -- 10.5 Learning-Based Lossless Composition -- 10.6 Discussion -- 10.7 Conclusion and Future Work -- References -- 11 Reinforcement Learning-Based Model Reduction for Partial Differential Equations: Application to the Burgers Equation -- 11.1 Introduction -- 11.2 Basic Notation and Definitions -- 11.3 RL-Based Model Reduction of PDEs -- 11.3.1 Reduced-Order PDE Approximation -- 11.3.2 Proper Orthogonal Decomposition for ROMs -- 11.3.3 Closure Models for ROM Stabilization -- 11.3.4 Main Result: RL-Based Closure Model -- 11.4 Extremum Seeking Based Closure Model Auto-Tuning -- 11.5 The Case of the Burgers Equation -- 11.6 Conclusion -- References -- Part IIIMulti-agent Systems and RL -- 12 Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms -- 12.1 Introduction -- 12.2 Background -- 12.2.1 Single-Agent RL -- 12.2.2 Multi-Agent RL Framework -- 12.3 Challenges in MARL Theory -- 12.3.1 Non-unique Learning Goals -- 12.3.2 Non-stationarity -- 12.3.3 Scalability Issue -- 12.3.4 Various Information Structures -- 12.4 MARL Algorithms with Theory -- 12.4.1 Cooperative Setting -- 12.4.2 Competitive Setting -- 12.4.3 Mixed Setting -- 12.5 Application Highlights -- 12.5.1 Cooperative Setting -- 12.5.2 Competitive Setting -- 12.5.3 Mixed Settings -- 12.6 Conclusions and Future Directions -- References. 13 Computational Intelligence in Uncertainty Quantification for Learning Control and Differential Games -- 13.1 Introduction -- 13.2 Problem Formulation of Optimal Control for Uncertain Systems -- 13.2.1 Optimal Control for Systems with Parameters Modulated by Multi-dimensional Uncertainties -- 13.2.2 Optimal Control for Random Switching Systems -- 13.3 Effective Uncertainty Evaluation Methods -- 13.3.1 Problem Formulation -- 13.3.2 The MPCM -- 13.3.3 The MPCM-OFFD -- 13.4 Optimal Control Solutions for Systems with Parameter Modulated by Multi-dimensional Uncertainties -- 13.4.1 Reinforcement Learning-Based Stochastic Optimal Control -- 13.4.2 Q-Learning-Based Stochastic Optimal Control -- 13.5 Optimal Control Solutions for Random Switching Systems -- 13.5.1 Optimal Controller for Random Switching Systems -- 13.5.2 Effective Estimator for Random Switching Systems -- 13.6 Differential Games for Systems with Parameters Modulated by Multi-dimensional Uncertainties -- 13.6.1 Stochastic Two-Player Zero-Sum Game -- 13.6.2 Multi-player Nonzero-Sum Game -- 13.7 Applications -- 13.7.1 Traffic Flow Management Under Uncertain Weather -- 13.7.2 Learning Control for Aerial Communication Using Directional Antennas (ACDA) Systems -- 13.8 Summary -- References -- 14 A Top-Down Approach to Attain Decentralized Multi-agents -- 14.1 Introduction -- 14.2 Background -- 14.2.1 Reinforcement Learning -- 14.2.2 Multi-agent Reinforcement Learning -- 14.3 Centralized Learning, But Decentralized Execution -- 14.3.1 A Bottom-Up Approach -- 14.3.2 A Top-Down Approach -- 14.4 Centralized Expert Supervises Multi-agents -- 14.4.1 Imitation Learning -- 14.4.2 CESMA -- 14.5 Experiments -- 14.5.1 Decentralization Can Achieve Centralized Optimality -- 14.5.2 Expert Trajectories Versus Multi-agent Trajectories -- 14.6 Conclusion -- References. 15 Modeling and Mitigating Link-Flooding Distributed Denial-of-Service Attacks via Learning in Stackelberg Games.
Record Nr.	UNINA-9910488722303321

Autore	Lorenz Uwe
Pubbl/distr/stampa	Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica	1 online resource (195 pages)
Disciplina	005.133
Soggetto topico	Java (Computer program language) Reinforcement learning Java (Llenguatge de programació) Aprenentatge per reforç (Intel·ligència artificial)
Soggetto genere / forma	Llibres electrònics
ISBN	9783031090301 9783031090295
Formato	Materiale a stampa
Livello bibliografico	Monografia
Lingua di pubblicazione	eng
Nota di contenuto	Intro -- Preface -- Introduction -- Contents -- 1: Reinforcement Learning as a Subfield of Machine Learning -- 1.1 Machine Learning as Automated Processing of Feedback from the Environment -- 1.2 Machine Learning -- 1.3 Reinforcement Learning with Java -- Bibliography -- 2: Basic Concepts of Reinforcement Learning -- 2.1 Agents -- 2.2 The Policy of the Agent -- 2.3 Evaluation of States and Actions (Q-Function, Bellman Equation) -- Bibliography -- 3: Optimal Decision-Making in a Known Environment -- 3.1 Value Iteration -- 3.1.1 Target-Oriented Condition Assessment ("Backward Induction") -- 3.1.2 Policy-Based State Valuation (Reward Prediction) -- 3.2 Iterative Policy Search -- 3.2.1 Direct Policy Improvement -- 3.2.2 Mutual Improvement of Policy and Value Function -- 3.3 Optimal Policy in a Board Game Scenario -- 3.4 Summary -- Bibliography -- 4: Decision-Making and Learning in an Unknown Environment -- 4.1 Exploration vs. Exploitation -- 4.2 Retroactive Processing of Experience ("Model-Free Reinforcement Learning") -- 4.2.1 Goal-Oriented Learning ("Value-Based") -- Subsequent evaluation of complete episodes ("Monte Carlo" Method) -- Immediate Valuation Using the Temporal Difference (Q- and SARSA Algorithm) -- Consideration of the Action History (Eligibility Traces) -- 4.2.2 Policy Search -- Monte Carlo Tactics Search -- Evolutionary Strategies -- Monte Carlo Policy Gradient (REINFORCE) -- 4.2.3 Combined Methods (Actor-Critic) -- "Actor-Critic" Policy Gradients -- Technical Improvements to the Actor-Critic Architecture -- Feature Vectors and Partially Observable Environments -- 4.3 Exploration with Predictive Simulations ("Model-Based Reinforcement Learning") -- 4.3.1 Dyna-Q -- 4.3.2 Monte Carlo Rollout -- 4.3.3 Artificial Curiosity -- 4.3.4 Monte Carlo Tree Search (MCTS) -- 4.3.5 Remarks on the Concept of Intelligence. 4.4 Systematics of the Learning Methods -- Bibliography -- 5: Artificial Neural Networks as Estimators for State Values and the Action Selection -- 5.1 Artificial Neural Networks -- 5.1.1 Pattern Recognition with the Perceptron -- 5.1.2 The Adaptability of Artificial Neural Networks -- 5.1.3 Backpropagation Learning -- 5.1.4 Regression with Multilayer Perceptrons -- 5.2 State Evaluation with Generalizing Approximations -- 5.3 Neural Estimators for Action Selection -- 5.3.1 Policy Gradient with Neural Networks -- 5.3.2 Proximal Policy Optimization -- 5.3.3 Evolutionary Strategy with a Neural Policy -- Bibliography -- 6: Guiding Ideas in Artificial Intelligence over Time -- 6.1 Changing Guiding Ideas -- 6.2 On the Relationship Between Humans and Artificial Intelligence -- Bibliography.
Record Nr.	UNISA-996495167903316

Autore	Lorenz Uwe
Pubbl/distr/stampa	Cham, Switzerland : , : Springer, , [2022]
Descrizione fisica	1 online resource (195 pages)
Disciplina	005.133
Soggetto topico	Java (Computer program language) Reinforcement learning Java (Llenguatge de programació) Aprenentatge per reforç (Intel·ligència artificial)
Soggetto genere / forma	Llibres electrònics
ISBN	9783031090301 9783031090295
Formato	Materiale a stampa
Livello bibliografico	Monografia
Lingua di pubblicazione	eng
Nota di contenuto	Intro -- Preface -- Introduction -- Contents -- 1: Reinforcement Learning as a Subfield of Machine Learning -- 1.1 Machine Learning as Automated Processing of Feedback from the Environment -- 1.2 Machine Learning -- 1.3 Reinforcement Learning with Java -- Bibliography -- 2: Basic Concepts of Reinforcement Learning -- 2.1 Agents -- 2.2 The Policy of the Agent -- 2.3 Evaluation of States and Actions (Q-Function, Bellman Equation) -- Bibliography -- 3: Optimal Decision-Making in a Known Environment -- 3.1 Value Iteration -- 3.1.1 Target-Oriented Condition Assessment ("Backward Induction") -- 3.1.2 Policy-Based State Valuation (Reward Prediction) -- 3.2 Iterative Policy Search -- 3.2.1 Direct Policy Improvement -- 3.2.2 Mutual Improvement of Policy and Value Function -- 3.3 Optimal Policy in a Board Game Scenario -- 3.4 Summary -- Bibliography -- 4: Decision-Making and Learning in an Unknown Environment -- 4.1 Exploration vs. Exploitation -- 4.2 Retroactive Processing of Experience ("Model-Free Reinforcement Learning") -- 4.2.1 Goal-Oriented Learning ("Value-Based") -- Subsequent evaluation of complete episodes ("Monte Carlo" Method) -- Immediate Valuation Using the Temporal Difference (Q- and SARSA Algorithm) -- Consideration of the Action History (Eligibility Traces) -- 4.2.2 Policy Search -- Monte Carlo Tactics Search -- Evolutionary Strategies -- Monte Carlo Policy Gradient (REINFORCE) -- 4.2.3 Combined Methods (Actor-Critic) -- "Actor-Critic" Policy Gradients -- Technical Improvements to the Actor-Critic Architecture -- Feature Vectors and Partially Observable Environments -- 4.3 Exploration with Predictive Simulations ("Model-Based Reinforcement Learning") -- 4.3.1 Dyna-Q -- 4.3.2 Monte Carlo Rollout -- 4.3.3 Artificial Curiosity -- 4.3.4 Monte Carlo Tree Search (MCTS) -- 4.3.5 Remarks on the Concept of Intelligence. 4.4 Systematics of the Learning Methods -- Bibliography -- 5: Artificial Neural Networks as Estimators for State Values and the Action Selection -- 5.1 Artificial Neural Networks -- 5.1.1 Pattern Recognition with the Perceptron -- 5.1.2 The Adaptability of Artificial Neural Networks -- 5.1.3 Backpropagation Learning -- 5.1.4 Regression with Multilayer Perceptrons -- 5.2 State Evaluation with Generalizing Approximations -- 5.3 Neural Estimators for Action Selection -- 5.3.1 Policy Gradient with Neural Networks -- 5.3.2 Proximal Policy Optimization -- 5.3.3 Evolutionary Strategy with a Neural Policy -- Bibliography -- 6: Guiding Ideas in Artificial Intelligence over Time -- 6.1 Changing Guiding Ideas -- 6.2 On the Relationship Between Humans and Artificial Intelligence -- Bibliography.
Record Nr.	UNINA-9910624394103321