8 February 2024
Proposes simplified R-FOS model to enable theoretical analysis of M-FOS opponent shaping
Discretizes continuous M-FOS meta-game into tabular Markov decision process
Derives exponential sample complexity bound using adapted reinforcement learning algorithm
Empirical tests show sample requirements scale exponentially with state-action space size
Results suggest efficiency challenges when scaling shaping methods
Simplifying complex opponent shaping
This paper proposes a simplified theoretical model to analyze the sample complexity of opponent shaping methods. Opponent shaping guides other agents' learning to improve collective outcomes. The model-free opponent shaping (M-FOS) method frames shaping as a meta-reinforcement learning problem, but lacks theoretical guarantees. This paper introduces a tabular M-FOS variant called R-FOS to enable analysis. R-FOS discretizes the continuous meta-game into a tabular Markov decision process. Using an adapted reinforcement learning algorithm, the paper derives a sample complexity bound exponential in the inner state-action space size. This suggests efficiency challenges for scaling shaping methods. Empirical tests support the theory, showing sample requirements grow exponentially with state-action space dimensions.
No comments yet, be the first to start the conversation...
Sign up to comment on this paper