When I load the mnist dataset from Keras, I get 4 variables -
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
The shape of x_train is (60000, 28, 28), which makes sense because it contains 60,000 28x28 pictures
The shape of the y_train is just (60000,) which shows that it is a one-dimensional vector which contains numeric target labels (0-9).
In order to run digit classification, neural networks generally output a one-hot encoded vector, which would have ten dimensions. I thought I needed to use to_categorical to convert the y target from numerical to categorical in order to have the shape output of the neural net match the training samples, which would presumably be (60000, 10).
But in a few examples I've found online, to_categorical was never used to reshape the training vector. y_train.shape remained (60000,) while the neural net's output layer was
model.add(Dense(10, activation="softmax"))
which outputs a 10-D one-hot vector.
And then they simply trained the model on y_train without issue
model.fit(x_train, y_train, epochs=2, validation_data=(x_test, y_test))
How is this possible? Wouldn't the neural net's output, which would be in the shape (60000, 10) be incompatible with (60000,)? Or does Keras automatically convert the categorical output to numeric?
EDIT: To be extra clear, I know how to one-hot encode it, but my question is why they didn't do that. In the example, the net worked without one-hot encoding the target classes, while the net's output was clearly one-hot encoded.
EDIT: Roshin was right. This is simply an effect of using the sparse_crossentropy loss, as opposed to categorical.