Paper Image

Planning experiments to learn complex reward models

Published on:

10 January 2024

Primary Category:

Machine Learning

Paper Authors:

Aldo Pacchiano,

Jonathan N. Lee,

Emma Brunskill

Bullets

Key Details

Proposes two approaches for experiment planning with complex reward models

First strategy constructs policies based on eluder dimension complexity measure

Second shows uniform sampling competitive for small action spaces

Also shows gap between planning and adaptive learning

AI generated summary

Planning experiments to learn complex reward models

This paper studies the problem of designing in advance an effective strategy for collecting data to learn a reward model, in order to find a near-optimal policy. The authors propose two approaches compatible with complex, non-linear reward functions. The first constructs a sequence of policies based on an 'eluder dimension' measure of complexity. The second shows a simple uniform sampling strategy can be competitive when the action space is small. Importantly, they also establish fundamental limits between such static planning and adaptive learning algorithms.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up