I have a multi-label classification in which each target is a vector of ones and zeros not mutually exclusive (for the sake of clarity, my target is something like [0, 1, 0, 0, 1, 1, ... ]).
My understanding so far is:
- I should use a binary cross-entropy function. (as explained in this answer) 
- Also, I understood that - tf.keras.losses.BinaryCrossentropy()is a wrapper around tensorflow's- sigmoid_cross_entropy_with_logits. This can be used either with- from_logits- Trueor- False. (as explained in this question)
- Since - sigmoid_cross_entropy_with_logitsperforms itself the sigmoid, it expects the input to be in the [-inf,+inf] range.
- tf.keras.losses.BinaryCrossentropy(), when the network implements itself a sigmoid activation of the last layer, must be used with- from_logits=False. It will then infert the sigmoid function and pass the output to- sigmoid_cross_entropy_with_logitsthat will do the sigmoid again. This however can cause numerical issues due to the asymptotes of the sigmoid/logit function.
- To improve the numerical stability, we can avoid the last sigmoid layer and use - tf.keras.losses.BinaryCrossentropy(from_logits=False)
Question:
If we use tf.keras.losses.BinaryCrossentropy(from_logits=False), what target should I use? Do I need to change my target for the one-hot vector?
I suppose that I should apply then a sigmoid activation to the network output at inference time. Is there a way to add a sigmoid layer active only in inference mode and not in training mode?
 
     
    

