2830-2021 : IEEE Standard for Technical Framework and Requirements of Trusted Execution Environment based Shared Machine Learning / / Institute of Electrical and Electronics Engineers |
Pubbl/distr/stampa | New York, NY, USA : , : IEEE, , 2021 |
Descrizione fisica | 1 online resource (23 pages) |
Disciplina | 006.31 |
Soggetto topico |
Machine learning
Deep learning (Machine learning) Reinforcement learning Computational learning theory |
ISBN | 1-5044-7724-3 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNINA-9910503501803321 |
New York, NY, USA : , : IEEE, , 2021 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
2830-2021 : IEEE Standard for Technical Framework and Requirements of Trusted Execution Environment based Shared Machine Learning / / Institute of Electrical and Electronics Engineers |
Pubbl/distr/stampa | New York, NY, USA : , : IEEE, , 2021 |
Descrizione fisica | 1 online resource (23 pages) |
Disciplina | 006.31 |
Soggetto topico |
Machine learning
Deep learning (Machine learning) Reinforcement learning Computational learning theory |
ISBN | 1-5044-7724-3 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNISA-996574913703316 |
New York, NY, USA : , : IEEE, , 2021 | ||
![]() | ||
Lo trovi qui: Univ. di Salerno | ||
|
3rd ACM International Conference on AI in Finance / / Daniele Magazzeni |
Autore | Magazzeni Daniele |
Pubbl/distr/stampa | New York : , : Association for Computing Machinery, , 2022 |
Descrizione fisica | 1 online resource (527 pages) |
Disciplina | 006.31 |
Soggetto topico | Reinforcement learning |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNINA-9910623981203321 |
Magazzeni Daniele
![]() |
||
New York : , : Association for Computing Machinery, , 2022 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
The Art of Reinforcement Learning : Fundamentals, Mathematics, and Implementations with Python / / by Michael Hu |
Autore | Hu Michael |
Edizione | [1st ed. 2023.] |
Pubbl/distr/stampa | Berkeley, CA : , : Apress : , : Imprint : Apress, , 2023 |
Descrizione fisica | 1 online resource (290 pages) |
Disciplina | 006.31 |
Soggetto topico |
Reinforcement learning
Feedback control systems Python (Computer program language) |
ISBN |
9781484296066
1484296060 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | Part I: Foundation -- Chapter 1: Introduction to Reinforcement Learning -- Chapter 2: Markov Decision Processes -- Chapter 3: Dynamic Programming -- Chapter 4: Monte Carlo Methods -- Chapter 5: Temporal Difference Learning -- Part II: Value Function Approximation -- Chapter 6: Linear Value Function Approximation -- Chapter 7: Nonlinear Value Function Approximation -- Chapter 8: Improvement to DQN -- Part III: Policy Approximation -- Chapter 9: Policy Gradient Methods -- Chapter 10: Problems with Continuous Action Space -- Chapter 11: Advanced Policy Gradient Methods -- Part IV: Advanced Topics -- Chapter 12: Distributed Reinforcement Learning -- Chapter 13: Curiosity-Driven Exploration -- Chapter 14: Planning with a Model – AlphaZero. |
Record Nr. | UNINA-9910770270703321 |
Hu Michael
![]() |
||
Berkeley, CA : , : Apress : , : Imprint : Apress, , 2023 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Decentralised reinforcement learning in Markov games / / Peter Vrancx ; supervisors, Ann Nowe, Katja Verbeeck |
Autore | Vrancx Peter |
Pubbl/distr/stampa | Brussel, Belgium : , : VUBPress, , 2010 |
Descrizione fisica | 1 online resource (217 p.) |
Disciplina | 006.31 |
Altri autori (Persone) |
NoweAnn
VerbeeckKatja |
Soggetto topico |
Reinforcement learning
Markov processes Game theory |
Soggetto genere / forma | Electronic books. |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | ""Front ""; ""Contents""; ""Chapter 1""; ""Chapter 2""; ""Chapter 3""; ""Chapter 4""; ""Chapter 5""; ""Chapter 6""; ""Chapter 7""; ""Chapter 8""; ""Chapter 9""; ""Appendix A""; ""Author Index"" |
Record Nr. | UNINA-9910464642003321 |
Vrancx Peter
![]() |
||
Brussel, Belgium : , : VUBPress, , 2010 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Decentralised reinforcement learning in Markov games / / Peter Vrancx ; supervisors, Ann Nowe, Katja Verbeeck |
Autore | Vrancx Peter |
Pubbl/distr/stampa | Brussel, Belgium : , : VUBPress, , 2010 |
Descrizione fisica | 1 online resource (217 p.) |
Disciplina | 006.31 |
Altri autori (Persone) |
NoweAnn
VerbeeckKatja |
Soggetto topico |
Reinforcement learning
Markov processes Game theory |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | ""Front ""; ""Contents""; ""Chapter 1""; ""Chapter 2""; ""Chapter 3""; ""Chapter 4""; ""Chapter 5""; ""Chapter 6""; ""Chapter 7""; ""Chapter 8""; ""Chapter 9""; ""Appendix A""; ""Author Index"" |
Record Nr. | UNINA-9910788940903321 |
Vrancx Peter
![]() |
||
Brussel, Belgium : , : VUBPress, , 2010 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Decentralised reinforcement learning in Markov games / / Peter Vrancx ; supervisors, Ann Nowe, Katja Verbeeck |
Autore | Vrancx Peter |
Pubbl/distr/stampa | Brussel, Belgium : , : VUBPress, , 2010 |
Descrizione fisica | 1 online resource (217 p.) |
Disciplina | 006.31 |
Altri autori (Persone) |
NoweAnn
VerbeeckKatja |
Soggetto topico |
Reinforcement learning
Markov processes Game theory |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto | ""Front ""; ""Contents""; ""Chapter 1""; ""Chapter 2""; ""Chapter 3""; ""Chapter 4""; ""Chapter 5""; ""Chapter 6""; ""Chapter 7""; ""Chapter 8""; ""Chapter 9""; ""Appendix A""; ""Author Index"" |
Record Nr. | UNINA-9910827910303321 |
Vrancx Peter
![]() |
||
Brussel, Belgium : , : VUBPress, , 2010 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Decision making under uncertainty and reinforcement learning : theory and algorithms / / Christos Dimitrakakis, Ronald Ortner |
Autore | Dimitrakakis Christos |
Pubbl/distr/stampa | Cham, Switzerland : , : Springer, , [2022] |
Descrizione fisica | 1 online resource (251 pages) |
Disciplina | 658.403 |
Collana | Intelligent systems reference library |
Soggetto topico |
Decision making - Mathematical models
Reinforcement learning Uncertainty |
ISBN | 3-031-07614-1 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Intro -- Preface -- Acknowledgements -- Reference -- Contents -- 1 Introduction -- 1.1 Uncertainty and Probability -- 1.2 The Exploration-Exploitation Trade-Off -- 1.3 Decision Theory and Reinforcement Learning -- References -- 2 Subjective Probability and Utility -- 2.1 Subjective Probability -- 2.1.1 Relative Likelihood -- 2.1.2 Subjective Probability Assumptions -- 2.1.3 Assigning Unique Probabilities* -- 2.1.4 Conditional Likelihoods -- 2.1.5 Probability Elicitation -- 2.2 Updating Beliefs: Bayes' Theorem -- 2.3 Utility Theory -- 2.3.1 Rewards and Preferences -- 2.3.2 Preferences Among Distributions -- 2.3.3 Utility -- 2.3.4 Measuring Utility* -- 2.3.5 Convex and Concave Utility Functions -- 2.4 Exercises -- Reference -- 3 Decision Problems -- 3.1 Introduction -- 3.2 Rewards that Depend on the Outcome of an Experiment -- 3.2.1 Formalisation of the Problem Setting -- 3.2.2 Decision Diagrams -- 3.2.3 Statistical Estimation* -- 3.3 Bayes Decisions -- 3.3.1 Convexity of the Bayes-Optimal Utility* -- 3.4 Statistical and Strategic Decision Making -- 3.4.1 Alternative Notions of Optimality -- 3.4.2 Solving Minimax Problems* -- 3.4.3 Two-Player Games -- 3.5 Decision Problems with Observations -- 3.5.1 Maximizing Utility When Making Observations -- 3.5.2 Bayes Decision Rules -- 3.5.3 Decision Problems in Classification -- 3.5.4 Calculating Posteriors -- 3.6 Summary -- 3.7 Exercises -- 3.7.1 Problems with No Observations -- 3.7.2 Problems with Observations -- 3.7.3 An Insurance Problem -- 3.7.4 Medical Diagnosis -- References -- 4 Estimation -- 4.1 Introduction -- 4.2 Sufficient Statistics -- 4.2.1 Sufficient Statistics -- 4.2.2 Exponential Families -- 4.3 Conjugate Priors -- 4.3.1 Bernoulli-Beta Conjugate Pair -- 4.3.2 Conjugates for the Normal Distribution -- 4.3.3 Conjugates for Multivariate Distributions -- 4.4 Credible Intervals.
4.5 Concentration Inequalities -- 4.5.1 Chernoff-Hoeffding Bounds -- 4.6 Approximate Bayesian Approaches -- 4.6.1 Monte Carlo Inference -- 4.6.2 Approximate Bayesian Computation -- 4.6.3 Analytic Approximations of the Posterior -- 4.6.4 Maximum Likelihood and Empirical Bayes Methods -- References -- 5 Sequential Sampling -- 5.1 Gains From Sequential Sampling -- 5.1.1 An Example: Sampling with Costs -- 5.2 Optimal Sequential Sampling Procedures -- 5.2.1 Multi-stage Problems -- 5.2.2 Backwards Induction for Bounded Procedures -- 5.2.3 Unbounded Sequential Decision Procedures -- 5.2.4 The Sequential Probability Ratio Test -- 5.2.5 Wald's Theorem -- 5.3 Martingales -- 5.4 Markov Processes -- 5.5 Exercises -- 6 Experiment Design and Markov Decision Processes -- 6.1 Introduction -- 6.2 Bandit Problems -- 6.2.1 An Example: Bernoulli Bandits -- 6.2.2 Decision-Theoretic Bandit Process -- 6.3 Markov Decision Processes and Reinforcement Learning -- 6.3.1 Value Functions -- 6.4 Finite Horizon, Undiscounted Problems -- 6.4.1 Direct Policy Evaluation -- 6.4.2 Backwards Induction Policy Evaluation -- 6.4.3 Backwards Induction Policy Optimization -- 6.5 Infinite-Horizon -- 6.5.1 Examples -- 6.5.2 Markov Chain Theory for Discounted Problems -- 6.5.3 Optimality Equations -- 6.5.4 MDP Algorithms for Infinite Horizon and Discounted Rewards -- 6.6 Optimality Criteria -- 6.7 Summary -- 6.8 Further Reading -- 6.9 Exercises -- 6.9.1 MDP Theory -- 6.9.2 Automatic Algorithm Selection -- 6.9.3 Scheduling -- 6.9.4 General Questions -- References -- 7 Simulation-Based Algorithms -- 7.1 Introduction -- 7.1.1 The Robbins-Monro Approximation -- 7.1.2 The Theory of the Approximation -- 7.2 Dynamic Problems -- 7.2.1 Monte Carlo Policy Evaluation and Iteration -- 7.2.2 Monte Carlo Updates -- 7.2.3 Temporal Difference Methods -- 7.2.4 Stochastic Value Iteration Methods. 7.3 Discussion -- 7.4 Exercises -- References -- 8 Approximate Representations -- 8.1 Introduction -- 8.1.1 Fitting a Value Function -- 8.1.2 Fitting a Policy -- 8.1.3 Features -- 8.1.4 Estimation Building Blocks -- 8.1.5 The Value Estimation Step -- 8.1.6 Policy Estimation -- 8.2 Approximate Policy Iteration (API) -- 8.2.1 Error Bounds for Approximate Value Functions -- 8.2.2 Rollout-Based Policy Iteration Methods -- 8.2.3 Least Squares Methods -- 8.3 Approximate Value Iteration -- 8.3.1 Approximate Backwards Induction -- 8.3.2 State Aggregation -- 8.3.3 Representative State Approximation -- 8.3.4 Bellman Error Methods -- 8.4 Policy Gradient -- 8.4.1 Stochastic Policy Gradient -- 8.4.2 Practical Considerations -- 8.5 Examples -- 8.6 Further Reading -- 8.7 Exercises -- References -- 9 Bayesian Reinforcement Learning -- 9.1 Introduction -- 9.1.1 Acting in Unknown MDPs -- 9.1.2 Updating the Belief -- 9.2 Finding Bayes-Optimal Policies -- 9.2.1 The Expected MDP Heuristic -- 9.2.2 The Maximum MDP Heuristic -- 9.2.3 Bayesian Policy Gradient -- 9.2.4 The Belief-Augmented MDP -- 9.2.5 Branch and Bound -- 9.2.6 Bounds on the Expected Utility -- 9.2.7 Estimating Lower Bounds on the Value Function with Backwards Induction -- 9.2.8 Further Reading -- 9.3 Bayesian Methods in Continuous Spaces -- 9.3.1 Linear-Gaussian Transition Models -- 9.3.2 Approximate Dynamic Programming -- 9.4 Partially Observable Markov Decision Processes -- 9.4.1 Solving Known POMDPs -- 9.4.2 Solving Unknown POMDPs -- 9.5 Relations Between Different Settings -- 9.6 Exercises -- References -- 10 Distribution-Free Reinforcement Learning -- 10.1 Introduction -- 10.2 Finite Stochastic Bandit Problems -- 10.2.1 The UCB1 Algorithm -- 10.2.2 Non i.i.d. Rewards -- 10.3 Reinforcement Learning in MDPs -- 10.3.1 An Upper-Confidence Bound Algorithm -- 10.3.2 Bibliographical Remarks -- References. 11 Conclusion -- Appendix Symbols -- Appendix Index -- Index. |
Record Nr. | UNINA-9910633910303321 |
Dimitrakakis Christos
![]() |
||
Cham, Switzerland : , : Springer, , [2022] | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Deep reinforcement learning for wireless communications and networking : theory, applications and implementation / / Dinh Thai Hoang [and four others] |
Autore | Hoang Dinh Thai <1986-> |
Edizione | [First edition.] |
Pubbl/distr/stampa | Hoboken, New Jersey : , : John Wiley & Sons, Inc., , [2023] |
Descrizione fisica | 1 online resource (291 pages) |
Disciplina | 006.31 |
Soggetto topico |
Reinforcement learning
Wireless communication systems |
Soggetto non controllato |
Artificial Intelligence
Computer Networks Computers |
ISBN |
1-119-87374-6
1-119-87368-1 1-119-87373-8 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Nota di contenuto |
Cover -- Title Page -- Copyright -- Contents -- Notes on Contributors -- Foreword -- Preface -- Acknowledgments -- Acronyms -- Introduction -- Part I Fundamentals of Deep Reinforcement Learning -- Chapter 1 Deep Reinforcement Learning and Its Applications -- 1.1 Wireless Networks and Emerging Challenges -- 1.2 Machine Learning Techniques and Development of DRL -- 1.2.1 Machine Learning -- 1.2.2 Artificial Neural Network -- 1.2.3 Convolutional Neural Network -- 1.2.4 Recurrent Neural Network -- 1.2.5 Development of Deep Reinforcement Learning -- 1.3 Potentials and Applications of DRL -- 1.3.1 Benefits of DRL in Human Lives -- 1.3.2 Features and Advantages of DRL Techniques -- 1.3.3 Academic Research Activities -- 1.3.4 Applications of DRL Techniques -- 1.3.5 Applications of DRL Techniques in Wireless Networks -- 1.4 Structure of this Book and Target Readership -- 1.4.1 Motivations and Structure of this Book -- 1.4.2 Target Readership -- 1.5 Chapter Summary -- References -- Chapter 2 Markov Decision Process and Reinforcement Learning -- 2.1 Markov Decision Process -- 2.2 Partially Observable Markov Decision Process -- 2.3 Policy and Value Functions -- 2.4 Bellman Equations -- 2.5 Solutions of MDP Problems -- 2.5.1 Dynamic Programming -- 2.5.1.1 Policy Evaluation -- 2.5.1.2 Policy Improvement -- 2.5.1.3 Policy Iteration -- 2.5.2 Monte Carlo Sampling -- 2.6 Reinforcement Learning -- 2.7 Chapter Summary -- References -- Chapter 3 Deep Reinforcement Learning Models and Techniques -- 3.1 Value‐Based DRL Methods -- 3.1.1 Deep Q‐Network -- 3.1.2 Double DQN -- 3.1.3 Prioritized Experience Replay -- 3.1.4 Dueling Network -- 3.2 Policy‐Gradient Methods -- 3.2.1 REINFORCE Algorithm -- 3.2.1.1 Policy Gradient Estimation -- 3.2.1.2 Reducing the Variance -- 3.2.1.3 Policy Gradient Theorem -- 3.2.2 Actor‐Critic Methods -- 3.2.3 Advantage of Actor‐Critic Methods.
3.2.3.1 Advantage of Actor‐Critic (A2C) -- 3.2.3.2 Asynchronous Advantage Actor‐Critic (A3C) -- 3.2.3.3 Generalized Advantage Estimate (GAE) -- 3.3 Deterministic Policy Gradient (DPG) -- 3.3.1 Deterministic Policy Gradient Theorem -- 3.3.2 Deep Deterministic Policy Gradient (DDPG) -- 3.3.3 Distributed Distributional DDPG (D4PG) -- 3.4 Natural Gradients -- 3.4.1 Principle of Natural Gradients -- 3.4.2 Trust Region Policy Optimization (TRPO) -- 3.4.2.1 Trust Region -- 3.4.2.2 Sample‐Based Formulation -- 3.4.2.3 Practical Implementation -- 3.4.3 Proximal Policy Optimization (PPO) -- 3.5 Model‐Based RL -- 3.5.1 Vanilla Model‐Based RL -- 3.5.2 Robust Model‐Based RL: Model‐Ensemble TRPO (ME‐TRPO) -- 3.5.3 Adaptive Model‐Based RL: Model‐Based Meta‐Policy Optimization (MB‐MPO) -- 3.6 Chapter Summary -- References -- Chapter 4 A Case Study and Detailed Implementation -- 4.1 System Model and Problem Formulation -- 4.1.1 System Model and Assumptions -- 4.1.1.1 Jamming Model -- 4.1.1.2 System Operation -- 4.1.2 Problem Formulation -- 4.1.2.1 State Space -- 4.1.2.2 Action Space -- 4.1.2.3 Immediate Reward -- 4.1.2.4 Optimization Formulation -- 4.2 Implementation and Environment Settings -- 4.2.1 Install TensorFlow with Anaconda -- 4.2.2 Q‐Learning -- 4.2.2.1 Codes for the Environment -- 4.2.2.2 Codes for the Agent -- 4.2.3 Deep Q‐Learning -- 4.3 Simulation Results and Performance Analysis -- 4.4 Chapter Summary -- References -- Part II Applications of DRL in Wireless Communications and Networking -- Chapter 5 DRL at the Physical Layer -- 5.1 Beamforming, Signal Detection, and Decoding -- 5.1.1 Beamforming -- 5.1.1.1 Beamforming Optimization Problem -- 5.1.1.2 DRL‐Based Beamforming -- 5.1.2 Signal Detection and Channel Estimation -- 5.1.2.1 Signal Detection and Channel Estimation Problem -- 5.1.2.2 RL‐Based Approaches -- 5.1.3 Channel Decoding. 5.2 Power and Rate Control -- 5.2.1 Power and Rate Control Problem -- 5.2.2 DRL‐Based Power and Rate Control -- 5.3 Physical‐Layer Security -- 5.4 Chapter Summary -- References -- Chapter 6 DRL at the MAC Layer -- 6.1 Resource Management and Optimization -- 6.2 Channel Access Control -- 6.2.1 DRL in the IEEE 802.11 MAC -- 6.2.2 MAC for Massive Access in IoT -- 6.2.3 MAC for 5G and B5G Cellular Systems -- 6.3 Heterogeneous MAC Protocols -- 6.4 Chapter Summary -- References -- Chapter 7 DRL at the Network Layer -- 7.1 Traffic Routing -- 7.2 Network Slicing -- 7.2.1 Network Slicing‐Based Architecture -- 7.2.2 Applications of DRL in Network Slicing -- 7.3 Network Intrusion Detection -- 7.3.1 Host‐Based IDS -- 7.3.2 Network‐Based IDS -- 7.4 Chapter Summary -- References -- Chapter 8 DRL at the Application and Service Layer -- 8.1 Content Caching -- 8.1.1 QoS‐Aware Caching -- 8.1.2 Joint Caching and Transmission Control -- 8.1.3 Joint Caching, Networking, and Computation -- 8.2 Data and Computation Offloading -- 8.3 Data Processing and Analytics -- 8.3.1 Data Organization -- 8.3.1.1 Data Partitioning -- 8.3.1.2 Data Compression -- 8.3.2 Data Scheduling -- 8.3.3 Tuning of Data Processing Systems -- 8.3.4 Data Indexing -- 8.3.4.1 Database Index Selection -- 8.3.4.2 Index Structure Construction -- 8.3.5 Query Optimization -- 8.4 Chapter Summary -- References -- Part III Challenges, Approaches, Open Issues, and Emerging Research Topics -- Chapter 9 DRL Challenges in Wireless Networks -- 9.1 Adversarial Attacks on DRL -- 9.1.1 Attacks Perturbing the State space -- 9.1.1.1 Manipulation of Observations -- 9.1.1.2 Manipulation of Training Data -- 9.1.2 Attacks Perturbing the Reward Function -- 9.1.3 Attacks Perturbing the Action Space -- 9.2 Multiagent DRL in Dynamic Environments -- 9.2.1 Motivations -- 9.2.2 Multiagent Reinforcement Learning Models. 9.2.2.1 Markov/Stochastic Games -- 9.2.2.2 Decentralized Partially Observable Markov Decision Process (DPOMDP) -- 9.2.3 Applications of Multiagent DRL in Wireless Networks -- 9.2.4 Challenges of Using Multiagent DRL in Wireless Networks -- 9.2.4.1 Nonstationarity Issue -- 9.2.4.2 Partial Observability Issue -- 9.3 Other Challenges -- 9.3.1 Inherent Problems of Using RL in Real‐Word Systems -- 9.3.1.1 Limited Learning Samples -- 9.3.1.2 System Delays -- 9.3.1.3 High‐Dimensional State and Action Spaces -- 9.3.1.4 System and Environment Constraints -- 9.3.1.5 Partial Observability and Nonstationarity -- 9.3.1.6 Multiobjective Reward Functions -- 9.3.2 Inherent Problems of DL and Beyond -- 9.3.2.1 Inherent Problems of DL -- 9.3.2.2 Challenges of DRL Beyond Deep Learning -- 9.3.3 Implementation of DL Models in Wireless Devices -- 9.4 Chapter Summary -- References -- Chapter 10 DRL and Emerging Topics in Wireless Networks -- 10.1 DRL for Emerging Problems in Future Wireless Networks -- 10.1.1 Joint Radar and Data Communications -- 10.1.2 Ambient Backscatter Communications -- 10.1.3 Reconfigurable Intelligent Surface‐Aided Communications -- 10.1.4 Rate Splitting Communications -- 10.2 Advanced DRL Models -- 10.2.1 Deep Reinforcement Transfer Learning -- 10.2.1.1 Reward Shaping -- 10.2.1.2 Intertask Mapping -- 10.2.1.3 Learning from Demonstrations -- 10.2.1.4 Policy Transfer -- 10.2.1.5 Reusing Representations -- 10.2.2 Generative Adversarial Network (GAN) for DRL -- 10.2.3 Meta Reinforcement Learning -- 10.3 Chapter Summary -- References -- Index -- EULA. |
Record Nr. | UNINA-9910830760503321 |
Hoang Dinh Thai <1986->
![]() |
||
Hoboken, New Jersey : , : John Wiley & Sons, Inc., , [2023] | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|
Deep reinforcement learning hands-on : apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more / / Maxim Lapan |
Autore | Lapan Maxim |
Edizione | [1st edition] |
Pubbl/distr/stampa | Birmingham, England : , : Packt Publishing, , 2018 |
Descrizione fisica | 1 online resource (1 volume) : illustrations |
Disciplina | 006.31 |
Soggetto topico | Reinforcement learning |
Soggetto genere / forma | Electronic books. |
ISBN | 1-78883-930-7 |
Formato | Materiale a stampa ![]() |
Livello bibliografico | Monografia |
Lingua di pubblicazione | eng |
Record Nr. | UNINA-9910467009503321 |
Lapan Maxim
![]() |
||
Birmingham, England : , : Packt Publishing, , 2018 | ||
![]() | ||
Lo trovi qui: Univ. Federico II | ||
|