validateExperience

Validate experiences for replay memory

Since R2023a

Syntax

validateExperience(buffer,experience)

Description

validateExperience(buffer,experience) validates whether the experiences in experience are compatible with replay memory buffer. If the experiences are not compatible with the replay memory, validateExperience generates an error message in the MATLAB^® command window.

example

Examples

collapse all

Create Experience Buffer

Open Live Script

Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.

obsInfo = rlNumericSpec([3 1],...
    LowerLimit=0,...
    UpperLimit=[1;5;10]);

Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.

actInfo = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[5;10]);

Create an experience buffer with a maximum length of 20,000.

buffer = rlReplayMemory(obsInfo,actInfo,20000);

Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.

For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone value to 0.

exp.Observation = {obsInfo.UpperLimit.*rand(3,1)};
exp.Action = {actInfo.UpperLimit.*rand(2,1)};
exp.Reward = 10*rand(1);
exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
exp.IsDone = 0;

Before appending experience to the buffer, you can validate whether the experience is compatible with the buffer. The validateExperience function generates an error if the experience is incompatible with the buffer.

validateExperience(buffer,exp)

Append the experience to the buffer.

append(buffer,exp);

You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.

for i = 1:100
    expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)};
    expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)};
    expBatch(i).Reward = 10*rand(1);
    expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
    expBatch(i).IsDone = 0;
end
expBatch(100).IsDone = 1;

validateExperience(buffer,expBatch)

append(buffer,expBatch);

After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.

miniBatch = sample(buffer,50);

You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.

horizonSample = sample(buffer,1,...
    NStepHorizon=10,...
    DiscountFactor=0.95);

The returned sample includes the following information.

Observation and Action are the observation and action from the first experience in the horizon.
NextObservation and IsDone are the next observation and termination signal from the final experience in the horizon.
Reward is the cumulative reward across the horizon using the specified discount factor.

You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.

sequenceSample = sample(buffer,1,...
    SequenceLength=20);

Input Arguments

collapse all

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

Experience buffer, specified as one of the following replay memory objects.

`experience` — Experience to append to the buffer
structure | structure array | `[]` | `{}` | `''`

Experience to append to the buffer, specified as a structure or structure array with the following fields (if experience is empty, or if it contains an empty structure, nothing is appended to the buffer).

`Observation` — Observation
cell array

Observation, specified as a cell array with length equal to the number of observation specifications specified when creating the buffer. The dimensions of each element in Observation must match the dimensions in the corresponding observation specification.

`Action` — Agent action
cell array

Action taken by the agent, specified as a cell array with length equal to the number of action specifications specified when creating the buffer. The dimensions of each element in Action must match the dimensions in the corresponding action specification.

`Reward` — Reward value
scalar

Reward value obtained by taking the specified action from the starting observation, specified as a scalar.

`NextObservation` — Next observation
cell array

Next observation reached by taking the specified action from the starting observation, specified as a cell array with the same format as Observation.

`IsDone` — Termination signal
`0` | `1` | `2`

Termination signal, specified as one of the following values.

0 — This experience is not the end of an episode.
1 — The episode terminated because the environment generated a termination signal.
2 — The episode terminated by reaching the maximum episode length.

Version History

Introduced in R2023a

validateExperience

Syntax

Description

Examples

Create Experience Buffer

Input Arguments

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`experience` — Experience to append to the buffer
structure | structure array | `[]` | `{}` | `''`

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
`0` | `1` | `2`

Version History

See Also

Functions

Objects

validateExperience

Syntax

Description

Examples

Create Experience Buffer

Input Arguments

buffer — Experience buffer rlReplayMemory object | rlPrioritizedReplayMemory object | rlHindsightReplayMemory object | rlHindsightPrioritizedReplayMemory object

experience — Experience to append to the buffer structure | structure array | [] | {} | ''

Observation — Observation cell array

Action — Agent action cell array

Reward — Reward value scalar

NextObservation — Next observation cell array

IsDone — Termination signal 0 | 1 | 2

Version History

See Also

Functions

Objects

`buffer` — Experience buffer
`rlReplayMemory` object | `rlPrioritizedReplayMemory` object | `rlHindsightReplayMemory` object | `rlHindsightPrioritizedReplayMemory` object

`experience` — Experience to append to the buffer
structure | structure array | `[]` | `{}` | `''`

`Observation` — Observation
cell array

`Action` — Agent action
cell array

`Reward` — Reward value
scalar

`NextObservation` — Next observation
cell array

`IsDone` — Termination signal
`0` | `1` | `2`