I'm trying to fit an RNN in Keras using sequences that have varying time lengths. My data is in a Numpy array with format (sample, time, feature) = (20631, max_time, 24) where max_time is determined at run-time as the number of time steps available for the sample with the most time stamps. I've padded the beginning of each time series with 0, except for the longest one, obviously.
I've initially defined my model like so...
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(max_time, 24)))
model.add(LSTM(100, input_dim=24))
model.add(Dense(2))
model.add(Activation(activate))
model.compile(loss=weibull_loglik_discrete, optimizer=RMSprop(lr=.01))
model.fit(train_x, train_y, nb_epoch=100, batch_size=1000, verbose=2, validation_data=(test_x, test_y))
For completeness, here's the code for the loss function:
def weibull_loglik_discrete(y_true, ab_pred, name=None):
y_ = y_true[:, 0]
u_ = y_true[:, 1]
a_ = ab_pred[:, 0]
b_ = ab_pred[:, 1]
hazard0 = k.pow((y_ + 1e-35) / a_, b_)
hazard1 = k.pow((y_ + 1) / a_, b_)
return -1 * k.mean(u_ * k.log(k.exp(hazard1 - hazard0) - 1.0) - hazard1)
And here's the code for the custom activation function:
def activate(ab):
a = k.exp(ab[:, 0])
b = k.softplus(ab[:, 1])
a = k.reshape(a, (k.shape(a)[0], 1))
b = k.reshape(b, (k.shape(b)[0], 1))
return k.concatenate((a, b), axis=1)
When I fit the model and make some test predictions, every sample in the test set gets exactly the same prediction, which seems fishy.
Things get better if I remove the masking layer, which makes me think there's something wrong with the masking layer, but as far as I can tell, I've followed the documentation exactly.
Is there something mis-specified with the masking layer? Am I missing something else?