what is the difference between Flatten() and GlobalAveragePooling2D() in keras

Question

I want to pass the output of ConvLSTM and Conv2D to a Dense Layer in Keras, what is the difference between using global average pooling and flatten Both is working in my case.

model.add(ConvLSTM2D(filters=256,kernel_size=(3,3)))
model.add(Flatten())
# or model.add(GlobalAveragePooling2D())
model.add(Dense(256,activation='relu'))

score 61 · Accepted Answer · answered Mar 15 '18 at 12:49

61

That both seem to work doesn't mean they do the same.

Flatten will take a tensor of any shape and transform it into a one dimensional tensor (plus the samples dimension) but keeping all values in the tensor. For example a tensor (samples, 10, 20, 1) will be flattened to (samples, 10 * 20 * 1).

GlobalAveragePooling2D does something different. It applies average pooling on the spatial dimensions until each spatial dimension is one, and leaves other dimensions unchanged. In this case values are not kept as they are averaged. For example a tensor (samples, 10, 20, 1) would be output as (samples, 1, 1, 1), assuming the 2nd and 3rd dimensions were spatial (channels last).

answered Mar 15 '18 at 12:49

Dr. Snoopy

55,122
7
121
140

7

@NicolasGervais Actually no, its more complicated, my answer is based on Keras from 2018, your answer is based in tf.keras (from 2020 I would guess), they behave differently. That does not make my answer wrong. – Dr. Snoopy Feb 11 '21 at 17:19
1

@NicolasGervais If you use keras around version 2.0 you will see it actually outputs (n, 1, 1, c). They changed it in later versions. – Dr. Snoopy Feb 11 '21 at 17:20
1

@NicolasGervais Its not wrong neither outdated, I have always said here in SO that keras and tf.keras are actually different libraries with different behavior. – Dr. Snoopy Feb 11 '21 at 19:59
Keras as a standalone python package is [largely abandoned](https://github.com/keras-team/keras/releases/tag/2.4.0) since Sep 2019. – Nicolas Gervais Feb 11 '21 at 20:05
1

@NicolasGervais So what? People are still using it. And there is already an answer here in the case for tf.keras, your answer does not add anything new. – Dr. Snoopy Feb 11 '21 at 20:07
1

@NicolasGervais Also note that this question is tagged as [keras], and from the tag description: If you are using tensorflow's built-in keras, use the [tf.keras] tag – Dr. Snoopy Feb 11 '21 at 20:10

Nicolas Gervais · Answer 2 · 2021-03-25T12:24:06.583

What a Flatten layer does

After convolutional operations, tf.keras.layers.Flatten will reshape a tensor into (n_samples, height*width*channels), for example turning (16, 28, 28, 3) into (16, 2352). Let's try it:

import tensorflow as tf

x = tf.random.uniform(shape=(100, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32)

flat = tf.keras.layers.Flatten()

flat(x).shape

TensorShape([100, 2352])

What a GlobalAveragePooling layer does

After convolutional operations, tf.keras.layers.GlobalAveragePooling layer does is average all the values according to the last axis. This means that the resulting shape will be (n_samples, last_axis). For instance, if your last convolutional layer had 64 filters, it would turn (16, 7, 7, 64) into (16, 64). Let's make the test, after a few convolutional operations:

import tensorflow as tf

x = tf.cast(
    tf.random.uniform(shape=(16, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32),
    tf.float32)


gap = tf.keras.layers.GlobalAveragePooling2D()

for i in range(5):
    conv = tf.keras.layers.Conv2D(64, 3)
    x = conv(x)
    print(x.shape)

print(gap(x).shape)

(16, 24, 24, 64)
(16, 22, 22, 64)
(16, 20, 20, 64)
(16, 18, 18, 64)
(16, 16, 16, 64)

(16, 64)

Which should you use?

The Flatten layer will always have at least as much parameters as the GlobalAveragePooling2D layer. If the final tensor shape before flattening is still large, for instance (16, 240, 240, 128), using Flatten will make an insane amount of parameters: 240*240*128 = 7,372,800. This huge number will be multiplied by the number of units in your next dense layer! At that moment, GlobalAveragePooling2D might be preferred in most cases. If you used MaxPooling2D and Conv2D so much that your tensor shape before flattening is like (16, 1, 1, 128), it won't make a difference. If you're overfitting, you might want to try GlobalAveragePooling2D.

score 10 · Answer 3 · answered Jun 25 '19 at 07:36

Flattening is No brainer and it simply converts a multi-dimensional object to one-dimensional by re-arranging the elements.

While GlobalAveragePooling is a methodology used for better representation of your vector. It can be 1D/2D/3D. It uses a parser window which moves across the object and pools the data by averaging it (GlobalAveragePooling) or picking max value (GlobalMaxPooling). Padding is essentially required to take the corner cases into the account.

Both are used for taking effect of sequencing into account in a simpler way.

Marco Cerliani · Answer 4 · 2020-08-31T16:56:49.533

You can test the difference between Flatten and GlobalPooling on your own making comparison with numpy, if you are more confident

We make a demonstration using, as input, a batch of images with this shape (batch_dim, height, width, n_channel)

import numpy as np
from tensorflow.keras.layers import *

batch_dim, H, W, n_channels = 32, 5, 5, 3
X = np.random.uniform(0,1, (batch_dim,H,W,n_channels)).astype('float32')

Flatten accepts as input tensor of at least 3D. It operates a reshape of the input in 2D with this format (batch_dim, all the rest). In our case of 4D, it operates a reshape in this format (batch_dim, H*W*n_channels).
```
np_flatten = X.reshape(batch_dim, -1) # (batch_dim, H*W*n_channels)
tf_flatten = Flatten()(X).numpy() # (batch_dim, H*W*n_channels)

(tf_flatten == np_flatten).all() # True
```
GlobalAveragePooling2D accepts as input 4D tensor. It operates the mean on the height and width dimensionalities for all the channels. The resulting dimensionality is 2D (batch_dim, n_channels). GlobalMaxPooling2D makes the same but with max operation.
```
np_GlobalAvgPool2D = X.mean(axis=(1,2)) # (batch_dim, n_channels)
tf_GlobalAvgPool2D = GlobalAveragePooling2D()(X).numpy() # (batch_dim, n_channels)

(tf_GlobalAvgPool2D == np_GlobalAvgPool2D).all() # True
```

score 0 · Answer 5 · answered Jan 15 '23 at 07:54

0

Compression ratio of parameters is exponentially high in Global Average Pooling,Flatten just reshape the matrix to one dimension, both can be fed to Fully connected networks Thanks

answered Jan 15 '23 at 07:54

Vignesh V

1
3

what is the difference between Flatten() and GlobalAveragePooling2D() in keras

5 Answers5

What a Flatten layer does

What a GlobalAveragePooling layer does

Which should you use?

Linked