Regarding the part of your question
I'm not sure it's working properly anyways
I believe you have a bug on the line where you check for an exact match:
if np.equal(valid_dataset, train_dataset[img]).any(1).all()
For illustration purposes, let's consider a toy example where all images are 5x5 and there are only 3 images in the valid_dataset. Let's go through possible steps of your check when the image is contained in the valid_dataset, e.g. let's check the 2nd image from the valid_dataset. In this case np.equal(valid_dataset, train_dataset[img]) might give us, e.g.,
[[[ True False True False False]
[False False True False False]
[False False True False False]
[False False False True False]
[False False False False False]]
[[ True True True True True]
[ True True True True True]
[ True True True True True]
[ True True True True True]
[ True True True True True]]
[[ True True True True False]
[ True True True False True]
[ True True False True False]
[ True False True False False]
[False True False False True]]]
Next, you apply .any(1) to this 3D result. This operation looks at projections on the 2nd dimension and tells us whether or not at least one value is True. Hence, the result of applying .any(1) would have shape 3x5, e.g., the value on the position [0,0] is a logical OR of the following values from our example
[[[ True ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]]
[...]
[...]]
Thus, in the case of our toy example the result of applying .any(1) would be
[[ True False True True False]
[ True True True True True]
[ True True True True True]]
Applying .all() to that would result into False even though the image is contained in the valid_data.
Correct solution:
What you want to do is to check that all pixels of the tested image are the same as pixels of at least one image in the valid_dataset. In other words, for the match (i.e. for the condition to be True) you require that in the valid_dataset all values projected to the plane given by the 2nd and the 3rd dimensions are the same as values of the tested image pixels for at least one valid_dataset image. Therefore, the condition should be
if np.equal(valid_dataset, train_dataset[img]).all(axis = (1,2)).any()