Context:
- I have a recurrent neural network with LSTM cells
 - The input to the network is a batch of size (batch_size, number_of_timesteps, one_hot_encoded_class) in my case (128, 300, 38)
 - The different rows of the batch (1-128) are not necessarily related to each other
 - The target for one time step is given by the value of the next time step.
 
My questions: When I train the network using an input batch of (128,300,38) and a target batch of the same size,
does the network always consider only the last time-step t to predict the value of the next timestep t+1?
or does it consider all time steps from the beginning of the sequence up to time step t?
or does the LSTM cell internally remember all previous states?
I am confused about the functioning because the network is trained on multiple time steps simulatenously so I am not sure how the LSTM cell can still have knowledge of the previous states.
I hope somebody can help. Thanks in advance!
Code for dicussion:
            cells = []
            for i in range(self.n_layers):
                cell = tf.contrib.rnn.LSTMCell(self.n_hidden)
                cells.append(cell)
            cell = tf.contrib.rnn.MultiRNNCell(cells)
            init_state = cell.zero_state(self.batch_size, tf.float32)
            outputs, final_state = tf.nn.dynamic_rnn(
                cell, inputs=self.inputs, initial_state=init_state)
            self.logits = tf.contrib.layers.linear(outputs, self.num_classes)
            softmax_ce = tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels=labels, logits=self.logits)
            self.loss = tf.reduce_mean(softmax_ce)
            self.train_step = tf.train.AdamOptimizer(self.lr).minimize(self.loss)
