Problem description
I am going through "Deep Learning in Python" by François Chollet (publisher webpage, notebooks on github). Replicating examples from Chapter 6 I encountered problems with (I believe) GRU layer with recurrent dropout.
The code in which I had first observed those errors is quite long, so I decided to stick to the simplest problem, which could replicate the error: classifying IMDB reviews into "positive" and "negative" categories.
When I use a GRU layer with recurrent dropout training loss (after couple of batches of first epoch) takes "value" of nan, while training accuracy (from the start of second epoch) takes the value of 0.
64/12000 [..............................] - ETA: 3:05 - loss: 0.6930 - accuracy: 0.4844
128/12000 [..............................] - ETA: 2:09 - loss: 0.6926 - accuracy: 0.4766
192/12000 [..............................] - ETA: 1:50 - loss: 0.6910 - accuracy: 0.5573
(...)
3136/12000 [======>.......................] - ETA: 59s - loss: 0.6870 - accuracy: 0.5635
3200/12000 [=======>......................] - ETA: 58s - loss: 0.6862 - accuracy: 0.5650
3264/12000 [=======>......................] - ETA: 58s - loss: 0.6860 - accuracy: 0.5650
3328/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5667
3392/12000 [=======>......................] - ETA: 57s - loss: nan - accuracy: 0.5560
3456/12000 [=======>......................] - ETA: 56s - loss: nan - accuracy: 0.5457
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.1593
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1584
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.1576
12000/12000 [==============================] - 83s 7ms/step - loss: nan - accuracy: 0.1572 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/20
64/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:15 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
11840/12000 [============================>.] - ETA: 1s - loss: nan - accuracy: 0.0000e+00
11904/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
11968/12000 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.0000e+00
12000/12000 [==============================] - 82s 7ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/20
64/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
128/12000 [..............................] - ETA: 1:18 - loss: nan - accuracy: 0.0000e+00
192/12000 [..............................] - ETA: 1:16 - loss: nan - accuracy: 0.0000e+00
(...)
Localizing the problem
To find out the solution I wrote the code presented below, which goes through several models (GRU/LSTM, {no dropout, only "normal" dropout, only recurrent dropout, "normal" and recurrent dropout, rmsprop/adam}) and presents loss and accuracy of all those models. (It also creates smaller, separate graphs for each model.)
# Based on examples from "Deep Learning with Python" by François Chollet:
## Constants, modules:
VERSION = 2
import os
from keras import models
from keras import layers
import matplotlib.pyplot as plt
import pylab
## Loading data:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = \
imdb.load_data(num_words=10000)
from keras.preprocessing import sequence
x_train = sequence.pad_sequences(x_train, maxlen=500)
x_test = sequence.pad_sequences(x_test, maxlen=500)
## Dictionary with models' hyperparameters:
MODELS = [
# GRU:
{"no": 1,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 2,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 3,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 4,
"layer_type": "GRU",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 5,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 6,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 7,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 8,
"layer_type": "GRU",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
# LSTM:
{"no": 9,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": None},
{"no": 10,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 11,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 12,
"layer_type": "LSTM",
"optimizer": "rmsprop",
"dropout": 0.3,
"recurrent_dropout": 0.3},
{"no": 13,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": None},
{"no": 14,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": None},
{"no": 15,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": None,
"recurrent_dropout": 0.3},
{"no": 16,
"layer_type": "LSTM",
"optimizer": "adam",
"dropout": 0.3,
"recurrent_dropout": 0.3},
]
## Adding name:
for model_dict in MODELS:
model_dict["name"] = f"{model_dict['layer_type']}"
model_dict["name"] += f"_d{model_dict['dropout']}" if model_dict['dropout'] is not None else f"_dN"
model_dict["name"] += f"_rd{model_dict['recurrent_dropout']}" if model_dict['recurrent_dropout'] is not None else f"_rdN"
model_dict["name"] += f"_{model_dict['optimizer']}"
## Fucntion - defing and training model:
def train_model(model_dict):
"""Defines and trains a model, outputs history."""
## Defining:
model = models.Sequential()
model.add(layers.Embedding(10000, 32))
recurrent_layer_kwargs = dict()
if model_dict["dropout"] is not None:
recurrent_layer_kwargs["dropout"] = model_dict["dropout"]
if model_dict["recurrent_dropout"] is not None:
recurrent_layer_kwargs["recurrent_dropout"] = model_dict["recurrent_dropout"]
if model_dict["layer_type"] == 'GRU':
model.add(layers.GRU(32, **recurrent_layer_kwargs))
elif model_dict["layer_type"] == 'LSTM':
model.add(layers.LSTM(32, **recurrent_layer_kwargs))
else:
raise ValueError("Wrong model_dict['layer_type'] value...")
model.add(layers.Dense(1, activation='sigmoid'))
## Compiling:
model.compile(
optimizer=model_dict["optimizer"],
loss='binary_crossentropy',
metrics=['accuracy'])
## Training:
history = model.fit(x_train, y_train,
epochs=20,
batch_size=64,
validation_split=0.2)
return history
## Multi-model graphs' parameters:
graph_all_nrow = 4
graph_all_ncol = 4
graph_all_figsize = (20, 20)
assert graph_all_nrow * graph_all_nrow >= len(MODELS)
## Figs and axes of multi-model graphs:
graph_all_loss_fig, graph_all_loss_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
graph_all_acc_fig, graph_all_acc_axs = plt.subplots(graph_all_nrow, graph_all_ncol, figsize=graph_all_figsize)
## Loop trough all models:
for i, model_dict in enumerate(MODELS):
history = train_model(model_dict)
## Metrics extraction:
loss = history.history['loss']
val_loss = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)
## Single-model grph - loss:
graph_loss_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_loss_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_loss_graph.png"
graph_loss_fig, graph_loss_ax = plt.subplots()
graph_loss_ax.plot(epochs, loss, 'bo', label='Training loss')
graph_loss_ax.plot(epochs, val_loss, 'b', label='Validation loss')
graph_loss_ax.legend()
graph_loss_fig.suptitle("Training and validation loss")
graph_loss_fig.savefig(graph_loss_fname)
pylab.close(graph_loss_fig)
## Single-model grph - accuracy:
graph_acc_fname = fr"{os.path.basename(__file__).replace('.py', '')}"
graph_acc_fname += fr"_v{VERSION}_{model_dict['no']}_{model_dict['name']}_acc_graph.png"
graph_acc_fig, graph_acc_ax = plt.subplots()
graph_acc_ax.plot(epochs, acc, 'bo', label='Training accuracy')
graph_acc_ax.plot(epochs, val_acc, 'b', label='Validation accuracy')
graph_acc_ax.legend()
graph_acc_fig.suptitle("Training and validation acc")
graph_acc_fig.savefig(graph_acc_fname)
pylab.close(graph_acc_fig)
## Position of axes on multi-model graph:
i_row = i // graph_all_ncol
i_col = i % graph_all_ncol
## Adding model metrics to multi-model graph - loss:
graph_all_loss_axs[i_row, i_col].plot(epochs, loss, 'bo', label='Training loss')
graph_all_loss_axs[i_row, i_col].plot(epochs, val_loss, 'b', label='Validation loss')
graph_all_loss_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
## Adding model metrics to multi-model graph - accuracy:
graph_all_acc_axs[i_row, i_col].plot(epochs, acc, 'bo', label='Training acc')
graph_all_acc_axs[i_row, i_col].plot(epochs, val_acc, 'b', label='Validation acc')
graph_all_acc_axs[i_row, i_col].set_title(fr"{model_dict['no']}. {model_dict['name']}")
## Saving multi-model graphs:
# Output files are quite big (8000x8000 PNG), you may want to decrease DPI.
graph_all_loss_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_loss_graph.png", dpi=400)
graph_all_acc_fig.savefig(fr"{os.path.basename(__file__).replace('.py', '')}_ALL_acc_graph.png", dpi=400)
Please find two main graphs below: Loss - binary crossentropy, Accuracy (I am not allowed te embed images in post due to low reputation).
I have also obtained similarly strange problems in regression model - the MAE was in range of several thousands - in the problem where $y$ range was maybe of several tens. (I decided not to include this model here, because it would make this question even longer.)
Versions of modules and libraries, hardware
- Modules:
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
matplotlib 3.1.3
tensorflow-estimator 1.14.0
tensorflow-gpu 2.1.0
tensorflow-gpu-estimator 2.1.0
keras.jsonfile:
{
"floatx": "float32",
"epsilon": 1e-07,
"backend": "tensorflow",
"image_data_format": "channels_last"
}
- CUDA - I have CUDA 10.0 and CUDA 10.1 installed on my system.
- CUDnn - I have three versions: cudnn-10.0 v7.4.2.24, cudnn-10.0 v7.6.4.38, cudnn-9.0 v7.4.2.24
- GPU: Nvidia GTX 1050Ti 4gb
- Windows 10 Home
Questions
- Do you know what may be the reason of this behavior?
- Is it possible that this is caused by multiple CUDA and CUDnn installations? Before observing the problem I have trained several models (both from book and my own ones) and the seemed to behave mor or less as expected, while having 2 CUDA and 2 CUDnn versions (those above without cudnn-10.0 v7.6.4.38) installed.
- Is there any official/good source of adequate combinations of keras, tensorflow, CUDA, CUDnn (and other relevant things e.g. maybe Visual Studio)? I cannot really find any authoritative and up-to-date source.
I hope I've described everything clearly enough. If you have any questions, please ask.