How to store entire observation sequences instead of individual transitions for a Deep Recurrent Q-Learning agent in Reinforcement Learning Toolbox?

Question

MathWorks Support Team on 4 Mar 2024

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/2136828-how-to-store-entire-observation-sequences-instead-of-individual-transitions-for-a-deep-recurrent-q-l

Answered: MathWorks Support Team on 12 Jul 2024

I have a Deep Recurrent Q-Learning application using the Reinforcement Learning Toolbox in MATLAB R2022b. I would like to store entire sequences instead of individual transitions in the replay buffer for the agent. For the agent parameter "ExperienceBuffer", how should the data be constructed to then sample minibatch of sequences instead of minibatch of transitions?

Sign in to answer this question.

Answer 1

MathWorks Support Team on 4 Mar 2024

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2136828-how-to-store-entire-observation-sequences-instead-of-individual-transitions-for-a-deep-recurrent-q-l#answer_1484898

Open in MATLAB Online

Instead of appending a struct that contains a sequence of data, create an array of structs. Each struct should contain one experience.

Appending experiences for a Deep Recurrent Q-Network (DRQN) agent is the same as for a Deep Q-Network (DQN) agent. You can create a minibatch that contains sub-sequences using the sample method if you specify the "SequenceLength" option. We use "Bootstrapped Random Updates" in the implementation of DRQN. Therefore, each sub-sequence starts from a random point. For example, the following creates a minibatch that contains 32 sub-sequences (each sequence length is 20):

Minibatchsize = 32;
mb = sample(ExperienceBuffer, Minibatchsize, SequenceLength=20);

However, if you use the built-in "train()" function to train the agent, there is no need to manually append experiences. "rlDQNAgent" will use DRQN if the agent uses a sequence network.

Refer also to the following example, which trains a DQN agent with LSTM network (DRQN) using the "train" function:

https://www.mathworks.com/help/releases/R2022b/reinforcement-learning/ug/train-dqn-to-control-house-heating.html

Again, "rlDQNAgent" uses the DRQN algorithm if the critic uses a sequence network (LSTM network for example). You must set "SequenceLength" to be more than 1 in the agent options.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to store entire observation sequences instead of individual transitions for a Deep Recurrent Q-Learning agent in Reinforcement Learning Toolbox?

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to store entire observation sequences instead of individual transitions for a Deep Recurrent Q-Learning agent in Reinforcement Learning Toolbox?

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments