How to store entire observation sequences instead of individual transitions for a Deep Recurrent Q-Learning agent in Reinforcement Learning Toolbox?

4 views (last 30 days)

I have a Deep Recurrent Q-Learning application using the Reinforcement Learning Toolbox in MATLAB R2022b. I would like to store entire sequences instead of individual transitions in the replay buffer for the agent. For the agent parameter "ExperienceBuffer", how should the data be constructed to then sample minibatch of sequences instead of minibatch of transitions?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 4 Mar 2024
Instead of appending a struct that contains a sequence of data, create an array of structs. Each struct should contain one experience.
Appending experiences for a Deep Recurrent Q-Network (DRQN) agent is the same as for a Deep Q-Network (DQN) agent. You can create a minibatch that contains sub-sequences using the sample method if you specify the "SequenceLength" option. We use "Bootstrapped Random Updates" in the implementation of DRQN. Therefore, each sub-sequence starts from a random point. For example, the following creates a minibatch that contains 32 sub-sequences (each sequence length is 20):
Minibatchsize = 32;
mb = sample(ExperienceBuffer, Minibatchsize, SequenceLength=20);
However, if you use the built-in "train()" function to train the agent, there is no need to manually append experiences. "rlDQNAgent" will use DRQN if the agent uses a sequence network.
Refer also to the following example, which trains a DQN agent with LSTM network (DRQN) using the "train" function:
Again, "rlDQNAgent" uses the DRQN algorithm if the critic uses a sequence network (LSTM network for example). You must set "SequenceLength" to be more than 1 in the agent options.

More Answers (0)

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!