Approximations and Policy Search 655 -- 12.1 Policy Search as a Sequential Decision Problem 657 -- 12.2 Classes of Policy Function Approximations 658 -- 12.3 Problem Characteristics 665 -- 12.4 Flavors of Policy Search 666 -- 12.5 Policy Search with Numerical Derivatives 669 -- 12.6 Derivative-Free Methods for Policy Search 670 -- 12.7 Exact Derivatives for Continuous Sequential Problems* 677 -- 12.8 Exact Derivatives for Discrete Dynamic Programs** 680 -- 12.9 Supervised Learning 686 -- 12.10 Why Does it Work? 687 -- 12.11 Bibliographic Notes 690 -- Exercises 691 -- Bibliography 698 -- 13 Cost Function Approximations 701 -- 13.1 General Formulation for Parametric CFA 703 -- 13.2 Objective-Modified CFAs 704 -- 13.3 Constraint-Modified CFAs 714 -- 13.4 Bibliographic Notes 725 -- Exercises 726 -- Bibliography 729 -- Part V – Lookahead Policies 731 -- 14 Exact Dynamic Programming 737 -- 14.1 Discrete Dynamic Programming 738 -- 14.2 The Optimality Equations 740 -- 14.3 Finite Horizon Problems 747 -- 14.4 Continuous Problems with Exact Solutions 750 -- 14.5 Infinite Horizon Problems* 755 -- 14.6 Value Iteration for Infinite Horizon Problems* 757 -- 14.7 Policy Iteration for Infinite Horizon Problems* 762 -- 14.8 Hybrid Value-Policy Iteration* 764 -- 14.9 Average Reward Dynamic Programming* 765 -- 14.10 The Linear Programming Method for Dynamic Programs** 766 -- 14.11 Linear Quadratic Regulation 767 -- 14.12 Why Does it Work?** 770 -- 14.13 Bibliographic Notes 783 -- Exercises 783 -- Bibliography 793 -- 15 Backward Approximate Dynamic Programming 795 -- 15.1 Backward Approximate Dynamic Programming for Finite Horizon Problems 797 -- 15.2 Fitted Value Iteration for Infinite Horizon Problems 804 -- 15.3 Value Function Approximation Strategies 805 -- 15.4 Computational Observations 810 -- 15.5 Bibliographic Notes 816 -- Exercises 816 -- Bibliography 821 -- 16 Forward ADP I: The Value of a Policy 823 -- 16.1 Sampling the Value of a Policy 824 -- 16.2 Stochastic Approximation Methods 835 -- 16.3 Bellman’s Equation Using a Linear Model* 837 -- 16.4 Analysis of TD(0), LSTD, and LSPE Using a Single State* 842 -- 16.5 Gradient-based Methods for Approximate Value Iteration* 845 -- 16.6 Value Function Approximations Based on Bayesian Learning* 852 -- 16.7 Learning Algorithms and Atepsizes 855 -- 16.8 Bibliographic Notes 860 -- Exercises 862 -- Bibliography 864 -- 17 Forward ADP II: Policy Optimization 867 -- 17.1 Overview of Algorithmic Strategies 869 -- 17.2 Approximate Value Iteration and Q-Learning Using Lookup Tables 871 -- 17.3 Styles of Learning 881 -- 17.4 Approximate Value Iteration Using Linear Models 886 -- 17.5 On-policy vs. off-policy learning and the exploration–exploitation problem 888 -- 17.6 Applications 894 -- 17.7 Approximate Policy Iteration 900 -- 17.8 The Actor–Critic Paradigm 907 -- 17.9 Statistical Bias in the Max Operator* 909 -- 17.10 The Linear Programming Method Using Linear Models* 912 -- 17.11 Finite Horizon Approximations for Steady-State Applications 915 -- 17.12 Bibliographic Notes 917 -- Exercises 918 -- Bibliography 924 -- 18 Forward ADP III: Convex Resource Allocation Problems 927 -- 18.1 Resource Allocation Problems 930 -- 18.2 Values Versus Marginal Values 937 -- 18.3 Piecewise Linear Approximations for Scalar Functions 938 -- 18.4 Regression Methods 941 -- 18.5 Separable Piecewise Linear Approximations 944 -- 18.6 Benders Decomposition for Nonseparable Approximations** 946 -- 18.7 Linear Approximations for High-Dimensional Applications 956 -- 18.8 Resource Allocation with Exogenous Information State 958 -- 18.9 Closing Notes 959 -- 18.10 Bibliographic Notes 960 -- Exercises 962 -- Bibliography 967 -- 19 Direct Lookahead Policies 971 -- 19.1 Optimal Policies Using Lookahead Models 974 -- 19.2 Creating an |