Paper Image

Learning optimal policies for average reward Markov decision processes

Published on:

13 October 2023

Primary Category:

Machine Learning

Paper Authors:

Shengbo Wang,

Jose Blanchet,

Peter Glynn

Bullets

Key Details

Matches lower bound in sample complexity for average reward MDPs

Uses reduction to discounted MDP with optimal dependence on mixing time

Combines algorithms from prior work in new way to achieve optimality

Requires only a generative model, no real experience needed

Applies to uniformly ergodic MDPs with finite state and action spaces

AI generated summary

Learning optimal policies for average reward Markov decision processes

This paper provides the first optimal sample complexity results for learning policies that maximize long-term average reward in Markov decision processes. It matches existing lower complexity bounds by using model-based dynamic programming on a discounted Markov decision process.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up