How to use 'MiniBatchSite' with time-series sequences in "trainNetwork"?

I have a time series 'T' which I would like to forecast with an LSTM network. While the code runs without errors, I do not quite seem to get the results I would expect.
In particular, the parameter 'MiniBatchSize' seemingly has no impact on the training. In fact, the number of epochs and iterations in the command window move in sync, i.e. there seems to be no difference between the two.
For simplification, let us assume that my data looks like this:
T is of dimension (1, 2500)
P is of dimension (10, 2500)
Hence, for each time step in T there exist 10 corresponding observations / explanatory input variables in P. Each P(i, :) represents a time series as well.
layers =
1x4 Layer array with layers:
1 '' Sequence Input Sequence input with 10 dimensions
2 '' LSTM LSTM with 100 hidden units
3 '' Fully Connected 1 fully connected layer
4 '' Regression Output mean-squared-error
I then call this to train the network:
[net, trainingRecord] = trainNetwork(P, T, layers, opts);

 Accepted Answer

There are two aspects to this problem:
1.) 'MiniBatchSize' does not create batches along the time dimension of the data. For example, consider the input P described as (10,2500) matrix, your intention is that setting 'MiniBatchSize' to 265 creates mini-batches like the following:
(10, 2500) --> (10, 265), (10, 265), (10, 265), …, (10, 115).
This is not what 'MiniBatchSize' does. 'MiniBatchSize' creates batches in the observation dimension, not the time dimension. For the described problem, there is only 1 predictor sequence, P, which means that there is only one observation, and the 'MiniBatchSize' defaults to one.
To create batches in the time dimension, you can use the 'SequenceLength' parameter. The default 'SequenceLength' parameter is ‘longest’, which means that the batch sequence length is determined by the longest sequence in the batch. However, we can set the 'SequenceLength' to 265, which will create mini-batches like the following:
(10, 2500) --> (10, 265), (10, 265), (10, 265), …, (10, 265)*
*the last batch is padded out from 115 (= rem(2500, 265)) to 265.
2.) In order to take into account that each of your P(i,:) represents a time series, you can change the formatting of P from an N x Ts matrix, to an N x 1 cell array, each containing an M x Ts timeseries. Below you will find some excerpts from and links to our documentation:
Documentation description on your use case:
trainedNet = trainNetwork(C,Y,layers,options)
trains an LSTM or BiLSTM network for classification and regression problems. C is a cell array containing sequence or time series predictors and Y contains the responses.
Sequences or time series data, specified as a cell array of matrices, or a matrix. For cell array input, C is an N-by-1 cell array where N is the number of observations. Each entry of C is a time series represented by a matrix with rows corresponding to data points, and columns corresponding to time steps.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!