Dynamic programming and optimal control: Approximate dynamic programming. Volume 2 / Dimitri P. Bertsekas.

Por: Bertsekas, Dimitri PTipo de material: TextoTextoEditor: Belmont, MA (USA): Athena Scientific, 2012Edición: 4a edDescripción: XVII, 694 p.: il., gráficas; 24 cmISBN: 1886529442; 9781886529441Tema(s): GERENCIA DE PRODUCCIÓN INDUSTRIALClasificación CDD: 658.42
Contenidos:
1 Discounted Problems - Theory 1.1 Minimization of Total Cost - Introduction 3 (11) 1.1.1 The Finite-Horizon DP Algorithm 5 (1) 1.1.2 Shorthand Notation and Monotonicity 6 (4) 1.1.3 A Preview of Infinite Horizon Results 10 (1) 1.1.4 Randomized and History-Dependent Policies 11 (3) 1.2 Discounted Problems - Bounded Cost per Stage 14 (8) 1.3 Scheduling and Multiarmed Bandit Problems 22 (10) 1.4 Discounted Continuous-Time Problems 32 (13) 1.5 The Role of Contraction Mappings 45 (12) 1.5.1 Sup-Norm Contractions 47 (7) 1.5.2 Discounted Problems - Unbounded Cost per Stage 54 (3) 1.6 General Forms of Discounted Dynamic Programming 57 (14) 1.6.1 Basic Results Under Contraction and Monotonicity 63 (6) 1.6.2 Discounted Dynamic Games 69 (2) 1.7 Notes, Sources, and Exercises 71 (11) 2 Discounted Problems - Computational Methods 2.1 Markovian Decision Problems 82 (2) 2.2 Value Iteration 84 (13) 2.2.1 Monotonic Error Bounds for Value Iteration 85 (7) 2.2.2 Variants of Value Iteration 92 (3) 2.2.3 Q-Learning 95 (2) 2.3 Policy Iteration 97 (15) 2.3.1 Policy Iteration for Costs 97 (5) 2.3.2 Policy Iteration for Q-Factors 102 (1) 2.3.3 Optimistic Policy Iteration 103 (3) 2.3.4 Limited Lookahead Policies and Rollout 106 (6) 2.4 Linear Programming Methods 112 (3) 2.5 Methods for General Discounted Problems 115 (23) 2.5.1 Limited Lookahead Policies and Approximations 117 (2) 2.5.2 Generalized Value Iteration 119 (1) 2.5.3 Approximate Value Iteration 120 (3) 2.5.4 Generalized Policy Iteration 123 (3) 2.5.5 Generalized Optimistic Policy Iteration 126 (6) 2.5.6 Approximate Policy Iteration 132 (5) 2.5.7 Mathematical Programming 137 (1) 2.6 Asynchronous Algorithms 138 (18) 2.6.1 Asynchronous Value Iteration 138 (6) 2.6.2 Asynchronous Policy Iteration 144 (5) 2.6.3 Policy Iteration with a Uniform Fixed Point 149 (7) 2.7 Notes, Sources, and Exercises 156 (16) 3 Stochastic Shortest Path Problems 3.1 Problem Formulation 172 (3) 3.2 Main Results 175 (7) 3.3 Underlying Contraction Properties 182 (2) 3.4 Value Iteration 184 (5) 3.4.1 Conditions for Finite Termination 185 (3) 3.4.2 Asynchronous Value Iteration 188 (1) 3.5 Policy Iteration 189 (12) 3.5.1 Optimistic Policy Iteration 190 (1) 3.5.2 Approximate Policy Iteration 191 (2) 3.5.3 Policy Iteration with Improper Policies 193 (4) 3.5.4 Policy Iteration with a Uniform Fixed Point 197 (4) 3.6 Countable-State Problems 201 (3) 3.7 Notes, Sources, and Exercises 204 (10) 4 Undiscounted Problems 4.1 Unbounded Costs per Stage 214 (17) 4.1.1 Main Results 216 (8) 4.1.2 Value Iteration 224 (6) 4.1.3 Other Computational Methods 230 (1) 4.2 Linear Systems and Quadratic Cost 231 (2) 4.3 Inventory Control 233 (2) 4.4 Optimal Stopping 235 (6) 4.5 Optimal Gambling Strategies 241 (7) 4.6 Continuous-Time Problems - Control of Queues 248 (8) 4.7 Nonstationary and Periodic Problems 256 (5) 4.8 Notes, Sources, and Exercises 261 (13) 5 Average Cost per Stage Problems 5.1 Finite-Spaces Average Cost Models 274 (24) 5.1.1 Relation with the Discounted Cost Problem 278 (6) 5.1.2 Blackwell Optimal Policies 284 (10) 5.1.3 Optimality Equations 294 (4) 5.2 Conditions for Equal Average Cost for all Initial States 298 (6) 5.3 Value Iteration 304 (25) 5.3.1 Single-Chain Value Iteration 307 (15) 5.3.2 Multi-Chain Value Iteration 322 (7) 5.4 Policy Iteration 329 (10) 5.4.1 Single-Chain Policy Iteration 329 (6) 5.4.2 Multi-Chain Policy Iteration 335 (4) 5.5 Linear Programming 339 (6) 5.6 Infinite-Spaces Average Cost Models 345 (29) 5.6.1 A Sufficient Condition for Optimality 353 (2) 5.6.2 Finite State Space and Infinite Control Space 355 (9) 5.6.3 Countable States - Vanishing Discount Approach 364 (3) 5.6.4 Countable States - Contraction Approach 367 (5) 5.6.5 Linear Systems with Quadratic Cost 372 (2) 5.7 Notes, Sources, and Exercises 374 (17) 6 Approximate Dynamic Programming - Discounted Models 6.1 General Issues of Simulation-Based Cost Approximation 391 (27) 6.1.1 Approximation Architectures 391 (6) 6.1.2 Simulation-Based Approximate Policy Iteration 397 (6) 6.1.3 Direct and Indirect Approximation 403 (2) 6.1.4 Monte Carlo Simulation 405 (8) 6.1.5 Simplifications 413 (5) 6.2 Direct Policy Evaluation - Gradient Methods 418 (5) 6.3 Projected Equation Methods for Policy Evaluation 423 (28) 6.3.1 The Projected Bellman Equation 424 (4) 6.3.2 The Matrix Form of the Projected Equation 428 (3) 6.3.3 Simulation-Based Methods 431 (2) 6.3.4 LSTD, LSPE, and TD(0) Methods 433 (4) 6.3.5 Optimistic Versions 437 (1) 6.3.6 Multistep Simulation-Based Methods 438 (9) 6.3.7 A Synopsis 447 (4) 6.4 Policy Iteration Issues 451 (23) 6.4.1 Exploration Enhancement by Geometrie Sampling 453 (11) 6.4.2 Exploration Enhancement by Off-Policy Methods 464 (3) 6.4.3 Policy Oscillations - Chattering 467 (7) 6.5 Aggregation Methods 474 (19) 6.5.1 Cost Approximation via the Aggregate Problem 482 (3) 6.5.2 Cost Approximation via the Enlarged Problem 485 (5) 6.5.3 Multistep Aggregation 490 (1) 6.5.4 Asynchronous Distributed Aggregation 491 (2) 6.6 Q-Learning 493 (18) 6.6.1 Q-Learning: A Stochastic VI Algorithm 494 (2) 6.6.2 Q-Learning and Policy Iteration 496 (3) 6.6.3 Q-Factor Approximation and Projected Equations 499 (3) 6.6.4 Q-Learning for Optimal Stopping Problems 502 (5) 6.6.5 Q-Learning and Aggregation 507 (2) 6.6.6 Finite Horizon Q-Learning 509 (2) 6.7 Notes, Sources, and Exercises 511 (21) 7 Approximate Dynamic Programming - Nondiscounted Models and Generalizations 7.1 Stochastic Shortest Path Problems 532 (5) 7.2 Average Cost Problems 537 (15) 7.2.1 Approximate Policy Evaluation 537 (9) 7.2.2 Approximate Policy Iteration 546 (2) 7.2.3 Q-Learning for Average Cost Problems 548 (4) 7.3 General Problems and Monte Carlo Linear Algebra 552 (68) 7.3.1 Projected Equations 562 (7) 7.3.2 Matrix Inversion and Iterative Methods 569 (7) 7.3.3 Multistep Methods 576 (8) 7.3.4 Extension of Q-Learning for Optimal Stopping 584 (2) 7.3.5 Equation Error Methods 586 (5) 7.3.6 Oblique Projections 591 (2) 7.3.7 Generalized Aggregation 593 (4) 7.3.8 Deterministic Methods for Singular Linear Systems 597 (11) 7.3.9 Stochastic Methods for Singular Linear Systems 608 (12) 7.4 Approximation in Policy Space 620 (9) 7.4.1 The Gradient Formula 622 (1) 7.4.2 Computing the Gradient by Simulation 623 (2) 7.4.3 Essential Features for Gradient Evaluation 625 (2) 7.4.4 Approximations in Policy and Value Space 627 (2) 7.5 Notes, Sources, and Exercises 629 (12) Appendix A Measure-Theoretic Issues in Dynamic Programming A.1 A Two-Stage Example 641 (5) A.2 Resolution of the Measurability Issues 646 (11) References 657 (34) Index 691
Resumen: Como enfoque del libro cambió, el aumento se puso énfasis en la investigación nuevo o reciente en DP aproximados y métodos basados ​​en la simulación, así como en métodos iterativos asíncronos, en vista del papel central de la simulación, que es por naturaleza asíncrona. Una gran cantidad de este material es el resultado de la investigación realizada en los seis años desde la edición anterior. Algunos de los aspectos más destacados, en el orden en que aparecen en el libro, son los siguientes: (a) Un amplio espectro de basada en la simulación, el valor de iteración aproximada, iteración de políticas y métodos Q-aprendizaje basados ​​en ecuaciones proyectadas y agregación. (B) nueva iteración de políticas y algoritmos de aprendizaje para Q-estocásticos problemas camino más corto con políticas inadecuadas. (C) los algoritmos de Q-learning fiables para la iteración política optimista. (D) Las nuevas técnicas de simulación para los métodos de varios pasos, como el muestreo geométrica y de forma libre, a partir de las ecuaciones de Bellman ponderados generalizadas. (e) los métodos computacionales para generalizada DP descontado / resumen, incluyendo análisis de convergencia y los límites de error para aproximaciones. (f) Monte Carlo métodos de álgebra lineal, que se extienden la metodología aproximada DP a ampliamente aplicables problemas que involucran la regresión y sistemas de ecuaciones lineales a gran escala.
Etiquetas de esta biblioteca: No hay etiquetas de esta biblioteca para este título. Ingresar para agregar etiquetas.
    Valoración media: 0.0 (0 votos)
Tipo de ítem Ubicación actual Colección Signatura Info Vol Copia número Estado Fecha de vencimiento Código de barras Reserva de ítems
LIBRO - MATERIAL GENERAL LIBRO - MATERIAL GENERAL Biblioteca Jorge Álvarez Lleras
Fondo general
Colección General 658.42 B551d (Navegar estantería) Vol. 2 1 Disponible 024945
Total de reservas: 0

Incluye bibliografía e indices

1 Discounted Problems - Theory
1.1 Minimization of Total Cost - Introduction
3 (11)
1.1.1 The Finite-Horizon DP Algorithm
5 (1)
1.1.2 Shorthand Notation and Monotonicity
6 (4)
1.1.3 A Preview of Infinite Horizon Results
10 (1)
1.1.4 Randomized and History-Dependent Policies
11 (3)
1.2 Discounted Problems - Bounded Cost per Stage
14 (8)
1.3 Scheduling and Multiarmed Bandit Problems
22 (10)
1.4 Discounted Continuous-Time Problems
32 (13)
1.5 The Role of Contraction Mappings
45 (12)
1.5.1 Sup-Norm Contractions
47 (7)
1.5.2 Discounted Problems - Unbounded Cost per Stage
54 (3)
1.6 General Forms of Discounted Dynamic Programming
57 (14)
1.6.1 Basic Results Under Contraction and Monotonicity
63 (6)
1.6.2 Discounted Dynamic Games
69 (2)
1.7 Notes, Sources, and Exercises
71 (11)
2 Discounted Problems - Computational Methods
2.1 Markovian Decision Problems
82 (2)
2.2 Value Iteration
84 (13)
2.2.1 Monotonic Error Bounds for Value Iteration
85 (7)
2.2.2 Variants of Value Iteration
92 (3)
2.2.3 Q-Learning
95 (2)
2.3 Policy Iteration
97 (15)
2.3.1 Policy Iteration for Costs
97 (5)
2.3.2 Policy Iteration for Q-Factors
102 (1)
2.3.3 Optimistic Policy Iteration
103 (3)
2.3.4 Limited Lookahead Policies and Rollout
106 (6)
2.4 Linear Programming Methods
112 (3)
2.5 Methods for General Discounted Problems
115 (23)
2.5.1 Limited Lookahead Policies and Approximations
117 (2)
2.5.2 Generalized Value Iteration
119 (1)
2.5.3 Approximate Value Iteration
120 (3)
2.5.4 Generalized Policy Iteration
123 (3)
2.5.5 Generalized Optimistic Policy Iteration
126 (6)
2.5.6 Approximate Policy Iteration
132 (5)
2.5.7 Mathematical Programming
137 (1)
2.6 Asynchronous Algorithms
138 (18)
2.6.1 Asynchronous Value Iteration
138 (6)
2.6.2 Asynchronous Policy Iteration
144 (5)
2.6.3 Policy Iteration with a Uniform Fixed Point
149 (7)
2.7 Notes, Sources, and Exercises
156 (16)
3 Stochastic Shortest Path Problems
3.1 Problem Formulation
172 (3)
3.2 Main Results
175 (7)
3.3 Underlying Contraction Properties
182 (2)
3.4 Value Iteration
184 (5)
3.4.1 Conditions for Finite Termination
185 (3)
3.4.2 Asynchronous Value Iteration
188 (1)
3.5 Policy Iteration
189 (12)
3.5.1 Optimistic Policy Iteration
190 (1)
3.5.2 Approximate Policy Iteration
191 (2)
3.5.3 Policy Iteration with Improper Policies
193 (4)
3.5.4 Policy Iteration with a Uniform Fixed Point
197 (4)
3.6 Countable-State Problems
201 (3)
3.7 Notes, Sources, and Exercises
204 (10)
4 Undiscounted Problems
4.1 Unbounded Costs per Stage
214 (17)
4.1.1 Main Results
216 (8)
4.1.2 Value Iteration
224 (6)
4.1.3 Other Computational Methods
230 (1)
4.2 Linear Systems and Quadratic Cost
231 (2)
4.3 Inventory Control
233 (2)
4.4 Optimal Stopping
235 (6)
4.5 Optimal Gambling Strategies
241 (7)
4.6 Continuous-Time Problems - Control of Queues
248 (8)
4.7 Nonstationary and Periodic Problems
256 (5)
4.8 Notes, Sources, and Exercises
261 (13)
5 Average Cost per Stage Problems
5.1 Finite-Spaces Average Cost Models
274 (24)
5.1.1 Relation with the Discounted Cost Problem
278 (6)
5.1.2 Blackwell Optimal Policies
284 (10)
5.1.3 Optimality Equations
294 (4)
5.2 Conditions for Equal Average Cost for all Initial States
298 (6)
5.3 Value Iteration
304 (25)
5.3.1 Single-Chain Value Iteration
307 (15)
5.3.2 Multi-Chain Value Iteration
322 (7)
5.4 Policy Iteration
329 (10)
5.4.1 Single-Chain Policy Iteration
329 (6)
5.4.2 Multi-Chain Policy Iteration
335 (4)
5.5 Linear Programming
339 (6)
5.6 Infinite-Spaces Average Cost Models
345 (29)
5.6.1 A Sufficient Condition for Optimality
353 (2)
5.6.2 Finite State Space and Infinite Control Space
355 (9)
5.6.3 Countable States - Vanishing Discount Approach
364 (3)
5.6.4 Countable States - Contraction Approach
367 (5)
5.6.5 Linear Systems with Quadratic Cost
372 (2)
5.7 Notes, Sources, and Exercises
374 (17)
6 Approximate Dynamic Programming - Discounted Models
6.1 General Issues of Simulation-Based Cost Approximation
391 (27)
6.1.1 Approximation Architectures
391 (6)
6.1.2 Simulation-Based Approximate Policy Iteration
397 (6)
6.1.3 Direct and Indirect Approximation
403 (2)
6.1.4 Monte Carlo Simulation
405 (8)
6.1.5 Simplifications
413 (5)
6.2 Direct Policy Evaluation - Gradient Methods
418 (5)
6.3 Projected Equation Methods for Policy Evaluation
423 (28)
6.3.1 The Projected Bellman Equation
424 (4)
6.3.2 The Matrix Form of the Projected Equation
428 (3)
6.3.3 Simulation-Based Methods
431 (2)
6.3.4 LSTD, LSPE, and TD(0) Methods
433 (4)
6.3.5 Optimistic Versions
437 (1)
6.3.6 Multistep Simulation-Based Methods
438 (9)
6.3.7 A Synopsis
447 (4)
6.4 Policy Iteration Issues
451 (23)
6.4.1 Exploration Enhancement by Geometrie Sampling
453 (11)
6.4.2 Exploration Enhancement by Off-Policy Methods
464 (3)
6.4.3 Policy Oscillations - Chattering
467 (7)
6.5 Aggregation Methods
474 (19)
6.5.1 Cost Approximation via the Aggregate Problem
482 (3)
6.5.2 Cost Approximation via the Enlarged Problem
485 (5)
6.5.3 Multistep Aggregation
490 (1)
6.5.4 Asynchronous Distributed Aggregation
491 (2)
6.6 Q-Learning
493 (18)
6.6.1 Q-Learning: A Stochastic VI Algorithm
494 (2)
6.6.2 Q-Learning and Policy Iteration
496 (3)
6.6.3 Q-Factor Approximation and Projected Equations
499 (3)
6.6.4 Q-Learning for Optimal Stopping Problems
502 (5)
6.6.5 Q-Learning and Aggregation
507 (2)
6.6.6 Finite Horizon Q-Learning
509 (2)
6.7 Notes, Sources, and Exercises
511 (21)
7 Approximate Dynamic Programming - Nondiscounted Models and Generalizations
7.1 Stochastic Shortest Path Problems
532 (5)
7.2 Average Cost Problems
537 (15)
7.2.1 Approximate Policy Evaluation
537 (9)
7.2.2 Approximate Policy Iteration
546 (2)
7.2.3 Q-Learning for Average Cost Problems
548 (4)
7.3 General Problems and Monte Carlo Linear Algebra
552 (68)
7.3.1 Projected Equations
562 (7)
7.3.2 Matrix Inversion and Iterative Methods
569 (7)
7.3.3 Multistep Methods
576 (8)
7.3.4 Extension of Q-Learning for Optimal Stopping
584 (2)
7.3.5 Equation Error Methods
586 (5)
7.3.6 Oblique Projections
591 (2)
7.3.7 Generalized Aggregation
593 (4)
7.3.8 Deterministic Methods for Singular Linear Systems
597 (11)
7.3.9 Stochastic Methods for Singular Linear Systems
608 (12)
7.4 Approximation in Policy Space
620 (9)
7.4.1 The Gradient Formula
622 (1)
7.4.2 Computing the Gradient by Simulation
623 (2)
7.4.3 Essential Features for Gradient Evaluation
625 (2)
7.4.4 Approximations in Policy and Value Space
627 (2)
7.5 Notes, Sources, and Exercises
629 (12)
Appendix A Measure-Theoretic Issues in Dynamic Programming
A.1 A Two-Stage Example
641 (5)
A.2 Resolution of the Measurability Issues
646 (11)
References 657 (34)
Index 691

Como enfoque del libro cambió, el aumento se puso énfasis en la investigación nuevo o reciente en DP aproximados y métodos basados ​​en la simulación, así como en métodos iterativos asíncronos, en vista del papel central de la simulación, que es por naturaleza asíncrona. Una gran cantidad de este material es el resultado de la investigación realizada en los seis años desde la edición anterior. Algunos de los aspectos más destacados, en el orden en que aparecen en el libro, son los siguientes: (a) Un amplio espectro de basada en la simulación, el valor de iteración aproximada, iteración de políticas y métodos Q-aprendizaje basados ​​en ecuaciones proyectadas y agregación. (B) nueva iteración de políticas y algoritmos de aprendizaje para Q-estocásticos problemas camino más corto con políticas inadecuadas. (C) los algoritmos de Q-learning fiables para la iteración política optimista. (D) Las nuevas técnicas de simulación para los métodos de varios pasos, como el muestreo geométrica y de forma libre, a partir de las ecuaciones de Bellman ponderados generalizadas. (e) los métodos computacionales para generalizada DP descontado / resumen, incluyendo análisis de convergencia y los límites de error para aproximaciones. (f) Monte Carlo métodos de álgebra lineal, que se extienden la metodología aproximada DP a ampliamente aplicables problemas que involucran la regresión y sistemas de ecuaciones lineales a gran escala.

No hay comentarios en este titulo.

para colocar un comentario.