LEADER 05398nam 2200661Ia 450 001 9910876618603321 005 20200520144314.0 010 $a1-283-37409-9 010 $a9786613374097 010 $a0-470-98004-4 010 $a0-470-98003-6 035 $a(CKB)3400000000000317 035 $a(EBL)661847 035 $a(SSID)ssj0000477769 035 $a(PQKBManifestationID)11296731 035 $a(PQKBTitleCode)TC0000477769 035 $a(PQKBWorkID)10513471 035 $a(PQKB)11586395 035 $a(MiAaPQ)EBC661847 035 $a(OCoLC)705354489 035 $a(EXLCZ)993400000000000317 100 $a20101103d2011 uy 0 101 0 $aeng 135 $aur|n|---||||| 181 $ctxt 182 $cc 183 $acr 200 10$aMulti-armed bandit allocation indices /$fJohn Gittins, Kevin Glazebrook, Richard Weber 205 $a2nd ed. 210 $aChichester $cWiley$d2011 215 $a1 online resource (311 p.) 300 $aDescription based upon print version of record. 311 $a0-470-67002-9 320 $aIncludes bibliographical references and index. 327 $aMulti-armed Bandit Allocation Indices; Contents; Foreword; Foreword to the first edition; Preface; Preface to the first edition; 1 Introduction or exploration; Exercises; 2 Main ideas: Gittins index; 2.1 Introduction; 2.2 Decision processes; 2.3 Simple families of alternative bandit processes; 2.4 Dynamic programming; 2.5 Gittins index theorem; 2.6 Gittins index; 2.6.1 Gittins index and the multi-armed bandit; 2.6.2 Coins problem; 2.6.3 Characterization of the optimal stopping time; 2.6.4 The restart-in-state formulation; 2.6.5 Dependence on discount factor 327 $a2.6.6 Myopic and forwards induction policies2.7 Proof of the index theorem by interchanging bandit portions; 2.8 Continuous-time bandit processes; 2.9 Proof of the index theorem by induction and interchange argument; 2.10 Calculation of Gittins indices; 2.11 Monotonicity conditions; 2.11.1 Monotone indices; 2.11.2 Monotone jobs; 2.12 History of the index theorem; 2.13 Some decision process theory; Exercises; 3 Necessary assumptions for indices; 3.1 Introduction; 3.2 Jobs; 3.3 Continuous-time jobs; 3.3.1 Definition; 3.3.2 Policies for continuous-time jobs 327 $a3.3.3 The continuous-time index theorem for a SFABP of jobs3.4 Necessary assumptions; 3.4.1 Necessity of an infinite time horizon; 3.4.2 Necessity of constant exponential discounting; 3.4.3 Necessity of a single processor; 3.5 Beyond the necessary assumptions; 3.5.1 Bandit-dependent discount factors; 3.5.2 Stochastic discounting; 3.5.3 Undiscounted rewards; 3.5.4 A discrete search problem; 3.5.5 Multiple processors; Exercises; 4 Superprocesses, precedence constraints and arrivals; 4.1 Introduction; 4.2 Bandit superprocesses; 4.3 The index theorem for superprocesses 327 $a4.4 Stoppable bandit processes4.5 Proof of the index theorem by freezing and promotion rules; 4.5.1 Freezing rules; 4.5.2 Promotion rules; 4.6 The index theorem for jobs with precedence constraints; 4.7 Precedence constraints forming an out-forest; 4.8 Bandit processes with arrivals; 4.9 Tax problems; 4.9.1 Ongoing bandits and tax problems; 4.9.2 Klimov's model; 4.9.3 Minimum EWFT for the M/G/1 queue; 4.10 Near optimality of nearly index policies; Exercises; 5 The achievable region methodology; 5.1 Introduction; 5.2 A simple example; 5.3 Proof of the index theorem by greedy algorithm 327 $a5.4 Generalized conservation laws and indexable systems5.5 Performance bounds for policies for branching bandits; 5.6 Job selection and scheduling problems; 5.7 Multi-armed bandits on parallel machines; Exercises; 6 Restless bandits and Lagrangian relaxation; 6.1 Introduction; 6.2 Restless bandits; 6.3 Whittle indices for restless bandits; 6.4 Asymptotic optimality; 6.5 Monotone policies and simple proofs of indexability; 6.6 Applications to multi-class queueing systems; 6.7 Performance bounds for the Whittle index policy; 6.8 Indices for more general resource configurations; Exercises 327 $a7 Multi-population random sampling (theory) 330 $aIn 1989 the first edition of this book set out Gittins' pioneering index solution to the multi-armed bandit problem and his subsequent investigation of a wide of sequential resource allocation and stochastic scheduling problems. Since then there has been a remarkable flowering of new insights, generalizations and applications, to which Glazebrook and Weber have made major contributions. This second edition brings the story up to date. There are new chapters on the achievable region approach to stochastic optimization problems, the construction of performance bounds for suboptimal policies, W 606 $aResource allocation$xMathematical models 606 $aMathematical optimization 606 $aProgramming (Mathematics) 615 0$aResource allocation$xMathematical models. 615 0$aMathematical optimization. 615 0$aProgramming (Mathematics) 676 $a519.5 700 $aGittins$b John C.$f1938-$01754207 701 $aGlazebrook$b Kevin D.$f1950-$01443322 701 $aWeber$b Richard$f1953-$0942460 801 0$bMiAaPQ 801 1$bMiAaPQ 801 2$bMiAaPQ 906 $aBOOK 912 $a9910876618603321 996 $aMulti-armed bandit allocation indices$94190436 997 $aUNINA