What a Flatten layer does
After convolutional operations, tf.keras.layers.Flatten will reshape a tensor into (n_samples, height*width*channels), for example turning (16, 28, 28, 3) into (16, 2352). Let's try it:
import tensorflow as tf
x = tf.random.uniform(shape=(100, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32)
flat = tf.keras.layers.Flatten()
flat(x).shape
TensorShape([100, 2352])
What a GlobalAveragePooling layer does
After convolutional operations, tf.keras.layers.GlobalAveragePooling layer does is average all the values according to the last axis. This means that the resulting shape will be (n_samples, last_axis). For instance, if your last convolutional layer had 64 filters, it would turn (16, 7, 7, 64) into (16, 64). Let's make the test, after a few convolutional operations:
import tensorflow as tf
x = tf.cast(
tf.random.uniform(shape=(16, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32),
tf.float32)
gap = tf.keras.layers.GlobalAveragePooling2D()
for i in range(5):
conv = tf.keras.layers.Conv2D(64, 3)
x = conv(x)
print(x.shape)
print(gap(x).shape)
(16, 24, 24, 64)
(16, 22, 22, 64)
(16, 20, 20, 64)
(16, 18, 18, 64)
(16, 16, 16, 64)
(16, 64)
Which should you use?
The Flatten layer will always have at least as much parameters as the GlobalAveragePooling2D layer. If the final tensor shape before flattening is still large, for instance (16, 240, 240, 128), using Flatten will make an insane amount of parameters: 240*240*128 = 7,372,800. This huge number will be multiplied by the number of units in your next dense layer! At that moment, GlobalAveragePooling2D might be preferred in most cases. If you used MaxPooling2D and Conv2D so much that your tensor shape before flattening is like (16, 1, 1, 128), it won't make a difference. If you're overfitting, you might want to try GlobalAveragePooling2D.