Dynamic programming and optimal control: Approximate dynamic programming. Volume 2 / Dimitri P. Bertsekas.
Tipo de material: TextoEditor: Belmont, MA (USA): Athena Scientific, 2012Edición: 4a edDescripción: XVII, 694 p.: il., gráficas; 24 cmISBN: 1886529442; 9781886529441Tema(s): GERENCIA DE PRODUCCIÓN INDUSTRIALClasificación CDD: 658.42Tipo de ítem | Ubicación actual | Colección | Signatura | Info Vol | Copia número | Estado | Fecha de vencimiento | Código de barras | Reserva de ítems |
---|---|---|---|---|---|---|---|---|---|
LIBRO - MATERIAL GENERAL | Biblioteca Jorge Álvarez Lleras Fondo general | Colección General | 658.42 B551d (Navegar estantería) | Vol. 2 | 1 | Disponible | 024945 |
Incluye bibliografía e indices
1 Discounted Problems - Theory
1.1 Minimization of Total Cost - Introduction
3 (11)
1.1.1 The Finite-Horizon DP Algorithm
5 (1)
1.1.2 Shorthand Notation and Monotonicity
6 (4)
1.1.3 A Preview of Infinite Horizon Results
10 (1)
1.1.4 Randomized and History-Dependent Policies
11 (3)
1.2 Discounted Problems - Bounded Cost per Stage
14 (8)
1.3 Scheduling and Multiarmed Bandit Problems
22 (10)
1.4 Discounted Continuous-Time Problems
32 (13)
1.5 The Role of Contraction Mappings
45 (12)
1.5.1 Sup-Norm Contractions
47 (7)
1.5.2 Discounted Problems - Unbounded Cost per Stage
54 (3)
1.6 General Forms of Discounted Dynamic Programming
57 (14)
1.6.1 Basic Results Under Contraction and Monotonicity
63 (6)
1.6.2 Discounted Dynamic Games
69 (2)
1.7 Notes, Sources, and Exercises
71 (11)
2 Discounted Problems - Computational Methods
2.1 Markovian Decision Problems
82 (2)
2.2 Value Iteration
84 (13)
2.2.1 Monotonic Error Bounds for Value Iteration
85 (7)
2.2.2 Variants of Value Iteration
92 (3)
2.2.3 Q-Learning
95 (2)
2.3 Policy Iteration
97 (15)
2.3.1 Policy Iteration for Costs
97 (5)
2.3.2 Policy Iteration for Q-Factors
102 (1)
2.3.3 Optimistic Policy Iteration
103 (3)
2.3.4 Limited Lookahead Policies and Rollout
106 (6)
2.4 Linear Programming Methods
112 (3)
2.5 Methods for General Discounted Problems
115 (23)
2.5.1 Limited Lookahead Policies and Approximations
117 (2)
2.5.2 Generalized Value Iteration
119 (1)
2.5.3 Approximate Value Iteration
120 (3)
2.5.4 Generalized Policy Iteration
123 (3)
2.5.5 Generalized Optimistic Policy Iteration
126 (6)
2.5.6 Approximate Policy Iteration
132 (5)
2.5.7 Mathematical Programming
137 (1)
2.6 Asynchronous Algorithms
138 (18)
2.6.1 Asynchronous Value Iteration
138 (6)
2.6.2 Asynchronous Policy Iteration
144 (5)
2.6.3 Policy Iteration with a Uniform Fixed Point
149 (7)
2.7 Notes, Sources, and Exercises
156 (16)
3 Stochastic Shortest Path Problems
3.1 Problem Formulation
172 (3)
3.2 Main Results
175 (7)
3.3 Underlying Contraction Properties
182 (2)
3.4 Value Iteration
184 (5)
3.4.1 Conditions for Finite Termination
185 (3)
3.4.2 Asynchronous Value Iteration
188 (1)
3.5 Policy Iteration
189 (12)
3.5.1 Optimistic Policy Iteration
190 (1)
3.5.2 Approximate Policy Iteration
191 (2)
3.5.3 Policy Iteration with Improper Policies
193 (4)
3.5.4 Policy Iteration with a Uniform Fixed Point
197 (4)
3.6 Countable-State Problems
201 (3)
3.7 Notes, Sources, and Exercises
204 (10)
4 Undiscounted Problems
4.1 Unbounded Costs per Stage
214 (17)
4.1.1 Main Results
216 (8)
4.1.2 Value Iteration
224 (6)
4.1.3 Other Computational Methods
230 (1)
4.2 Linear Systems and Quadratic Cost
231 (2)
4.3 Inventory Control
233 (2)
4.4 Optimal Stopping
235 (6)
4.5 Optimal Gambling Strategies
241 (7)
4.6 Continuous-Time Problems - Control of Queues
248 (8)
4.7 Nonstationary and Periodic Problems
256 (5)
4.8 Notes, Sources, and Exercises
261 (13)
5 Average Cost per Stage Problems
5.1 Finite-Spaces Average Cost Models
274 (24)
5.1.1 Relation with the Discounted Cost Problem
278 (6)
5.1.2 Blackwell Optimal Policies
284 (10)
5.1.3 Optimality Equations
294 (4)
5.2 Conditions for Equal Average Cost for all Initial States
298 (6)
5.3 Value Iteration
304 (25)
5.3.1 Single-Chain Value Iteration
307 (15)
5.3.2 Multi-Chain Value Iteration
322 (7)
5.4 Policy Iteration
329 (10)
5.4.1 Single-Chain Policy Iteration
329 (6)
5.4.2 Multi-Chain Policy Iteration
335 (4)
5.5 Linear Programming
339 (6)
5.6 Infinite-Spaces Average Cost Models
345 (29)
5.6.1 A Sufficient Condition for Optimality
353 (2)
5.6.2 Finite State Space and Infinite Control Space
355 (9)
5.6.3 Countable States - Vanishing Discount Approach
364 (3)
5.6.4 Countable States - Contraction Approach
367 (5)
5.6.5 Linear Systems with Quadratic Cost
372 (2)
5.7 Notes, Sources, and Exercises
374 (17)
6 Approximate Dynamic Programming - Discounted Models
6.1 General Issues of Simulation-Based Cost Approximation
391 (27)
6.1.1 Approximation Architectures
391 (6)
6.1.2 Simulation-Based Approximate Policy Iteration
397 (6)
6.1.3 Direct and Indirect Approximation
403 (2)
6.1.4 Monte Carlo Simulation
405 (8)
6.1.5 Simplifications
413 (5)
6.2 Direct Policy Evaluation - Gradient Methods
418 (5)
6.3 Projected Equation Methods for Policy Evaluation
423 (28)
6.3.1 The Projected Bellman Equation
424 (4)
6.3.2 The Matrix Form of the Projected Equation
428 (3)
6.3.3 Simulation-Based Methods
431 (2)
6.3.4 LSTD, LSPE, and TD(0) Methods
433 (4)
6.3.5 Optimistic Versions
437 (1)
6.3.6 Multistep Simulation-Based Methods
438 (9)
6.3.7 A Synopsis
447 (4)
6.4 Policy Iteration Issues
451 (23)
6.4.1 Exploration Enhancement by Geometrie Sampling
453 (11)
6.4.2 Exploration Enhancement by Off-Policy Methods
464 (3)
6.4.3 Policy Oscillations - Chattering
467 (7)
6.5 Aggregation Methods
474 (19)
6.5.1 Cost Approximation via the Aggregate Problem
482 (3)
6.5.2 Cost Approximation via the Enlarged Problem
485 (5)
6.5.3 Multistep Aggregation
490 (1)
6.5.4 Asynchronous Distributed Aggregation
491 (2)
6.6 Q-Learning
493 (18)
6.6.1 Q-Learning: A Stochastic VI Algorithm
494 (2)
6.6.2 Q-Learning and Policy Iteration
496 (3)
6.6.3 Q-Factor Approximation and Projected Equations
499 (3)
6.6.4 Q-Learning for Optimal Stopping Problems
502 (5)
6.6.5 Q-Learning and Aggregation
507 (2)
6.6.6 Finite Horizon Q-Learning
509 (2)
6.7 Notes, Sources, and Exercises
511 (21)
7 Approximate Dynamic Programming - Nondiscounted Models and Generalizations
7.1 Stochastic Shortest Path Problems
532 (5)
7.2 Average Cost Problems
537 (15)
7.2.1 Approximate Policy Evaluation
537 (9)
7.2.2 Approximate Policy Iteration
546 (2)
7.2.3 Q-Learning for Average Cost Problems
548 (4)
7.3 General Problems and Monte Carlo Linear Algebra
552 (68)
7.3.1 Projected Equations
562 (7)
7.3.2 Matrix Inversion and Iterative Methods
569 (7)
7.3.3 Multistep Methods
576 (8)
7.3.4 Extension of Q-Learning for Optimal Stopping
584 (2)
7.3.5 Equation Error Methods
586 (5)
7.3.6 Oblique Projections
591 (2)
7.3.7 Generalized Aggregation
593 (4)
7.3.8 Deterministic Methods for Singular Linear Systems
597 (11)
7.3.9 Stochastic Methods for Singular Linear Systems
608 (12)
7.4 Approximation in Policy Space
620 (9)
7.4.1 The Gradient Formula
622 (1)
7.4.2 Computing the Gradient by Simulation
623 (2)
7.4.3 Essential Features for Gradient Evaluation
625 (2)
7.4.4 Approximations in Policy and Value Space
627 (2)
7.5 Notes, Sources, and Exercises
629 (12)
Appendix A Measure-Theoretic Issues in Dynamic Programming
A.1 A Two-Stage Example
641 (5)
A.2 Resolution of the Measurability Issues
646 (11)
References 657 (34)
Index 691
Como enfoque del libro cambió, el aumento se puso énfasis en la investigación nuevo o reciente en DP aproximados y métodos basados en la simulación, así como en métodos iterativos asíncronos, en vista del papel central de la simulación, que es por naturaleza asíncrona. Una gran cantidad de este material es el resultado de la investigación realizada en los seis años desde la edición anterior. Algunos de los aspectos más destacados, en el orden en que aparecen en el libro, son los siguientes: (a) Un amplio espectro de basada en la simulación, el valor de iteración aproximada, iteración de políticas y métodos Q-aprendizaje basados en ecuaciones proyectadas y agregación. (B) nueva iteración de políticas y algoritmos de aprendizaje para Q-estocásticos problemas camino más corto con políticas inadecuadas. (C) los algoritmos de Q-learning fiables para la iteración política optimista. (D) Las nuevas técnicas de simulación para los métodos de varios pasos, como el muestreo geométrica y de forma libre, a partir de las ecuaciones de Bellman ponderados generalizadas. (e) los métodos computacionales para generalizada DP descontado / resumen, incluyendo análisis de convergencia y los límites de error para aproximaciones. (f) Monte Carlo métodos de álgebra lineal, que se extienden la metodología aproximada DP a ampliamente aplicables problemas que involucran la regresión y sistemas de ecuaciones lineales a gran escala.
No hay comentarios en este titulo.