What is the relationship between batch size, timestep and error in LSTM (Keras)?

Question

Let,

Sample Size = 100 (X1,X2,...,X100)

Timesteps = 5

Input Feature = 10

Error Calculation:

How is the error calculation done when batch size = Sample size? My understanding: I will insert X1,X2,X3,X4,X5 into LSTM and get an output after 5 timesteps, say Y1.

Error E1 = X6 - Y1. Similarly I will calculate E2,E3,...,E95.

Actual Error = E1+E2+....+E95. This will be used to update weights.

Is it correct?

Error for Batch:

Based on above understanding. If batch size = 10. Then only E1,E2,E3,E4 and E5 will be used to calculate actual error. This will be used to update weights.

Batching in stateful LSTM:

Batches allows the model to allow parallelism where each entity in the batch calculates its error and then all the errors are summed. How does LSTM achieve parallelism within a batch if the LSTM is stateful (the hidden states of previous sequence are used to initialize the hidden states of next sequence, is this understanding of Satetful correct?) ?

References:

LSTM Batches vs Timesteps

Understanding Keras LSTMs: Role of Batch-size and Statefulness

doubts regarding batch size and time steps in RNN

The first 3 questions are linked. 4th one could be a seperate one. Edit 1: It should make more sense now — Aman Krishna, Jun 09 '20 at 16:02

score 1 · Accepted Answer · edited Nov 20 '20 at 21:10

1

Batch size effect on LSTM: For batch size 1 the model takes 1 input at each timestep. For batch size n, model takes n input at each Timestep

Image for clarification Credit: Deeplearning.ai

Error Calculation part mentioned in question: It is the error calculation for batch size 1.

Error for a batch: Sum up the error for each element of the batch to get the final error

Batching in Stateful LSTM: My understanding of parallelism was incorrect. Parallelism is done within batch not across them.

edited Nov 20 '20 at 21:10

rayryeng

102,964
22
184
193

answered Jun 10 '20 at 08:13

Aman Krishna

41
7

Thank you. So if I understand correctly: timesteps (dim 1 of the input data) is the length of the RNN; batch_size (dim 0 of the input data) is the number of runs computed at the same time (parallelism). If I set stateful=True, I'm propagating the "memory" hidden data between batches, but not within a batch. So if I want to train a perfectly stateful RNN, I should set timesteps as high as the hardware allows, and batch_size = 1? Is this correct? – Tobia Feb 22 '22 at 19:06
I do not have much clarity about stateful RNNs as I never used it much. However, I wrote this blog a couple of years later, and should help you with understanding batch, timestep, and features using simple python code https://medium.com/python-in-plain-english/a-very-simple-lstm-6ec4cb6286b6 – Aman Krishna Nov 12 '22 at 12:08

What is the relationship between batch size, timestep and error in LSTM (Keras)?

1 Answers1