Let,
Sample Size = 100 (X1,X2,...,X100)
Timesteps = 5
Input Feature = 10
Error Calculation:
How is the error calculation done when batch size = Sample size?
My understanding: I will insert X1,X2,X3,X4,X5 into LSTM and get an output after 5 timesteps, say Y1.
Error E1 = X6 - Y1. Similarly I will calculate E2,E3,...,E95.
Actual Error = E1+E2+....+E95. This will be used to update weights.
Is it correct?
Error for Batch:
Based on above understanding. If batch size = 10. Then only E1,E2,E3,E4 and E5 will be used to calculate actual error. This will be used to update weights.
Batching in stateful LSTM:
Batches allows the model to allow parallelism where each entity in the batch calculates its error and then all the errors are summed. How does LSTM achieve parallelism within a batch if the LSTM is stateful (the hidden states of previous sequence are used to initialize the hidden states of next sequence, is this understanding of Satetful correct?) ?
References:
Understanding Keras LSTMs: Role of Batch-size and Statefulness
