I’ve a question regarding to the sparse_softmax_cross_entropy cost function in TensorFlow.
I want to use it in a semantic segmentation context where I use an autoencoder architecture which uses typical convolution operations to downsample images to create a feature vector. This vector is than upsampled (using conv2d_transposeand one-by-one convolutions to create an output image.
Hence, my input consists of single channel images with shape (1,128,128,1), where the first index represents the batch size and the last one the number of channels. The pixel of the image are currently either 0 or 1. So each pixel is mapped to a class. The output image of the autoencoder follows the same rules. Hence, I can’t use any predefined cost function than either MSE or the previously mentioned one.
The network works fine with MSE. But I can’t get it working with sparse_softmax_cross_entropy. It seems like that this is the correct cost function in this context but I’m a bit confused about the representation of the logits. The official doc says that the logits should have the shape (d_i,...,d_n,num_classes). I tried to ignore the num_classes part but this causes an error which says that only the interval [0,1) is allowed. Of course, I need to specify the number of classes which would turn the allowed interval to [0,2) because the exclusive upper bound is obviously num_classes. 
Could someone please explain how to turn my output image into the required logits?
The current code for the cost function is:
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
The squeeze removes the last dimension of the label input to create a shape for the labels of [1 128 128]. This causes the following exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).
Edit:
As requested, here's a minimal example to verfiy the behavior of the cost function in the context of fully-convolutional nets:
constructor snipped:
def __init__(self, img_channels=1, img_width=128, img_height=128):
    ...
    self._loss_op = None
    self._learning_rate_placeholder = tf.placeholder(tf.float32, [], 'lr')
    self._input_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'x')
    self._target_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'y')
    self._model = self.build_model()
    self.init_optimizer()
build_model() snipped:
 def build_model(self):
        with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE):
            #not necessary
            x = tf.reshape(self._input_placeholder, [-1, self._img_width, self._img_height, self._img_channels])
            conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
            conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
        with tf.variable_scope('conv2', reuse=tf.AUTO_REUSE):
            conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
            conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
        with tf.variable_scope('conv3_red', reuse=tf.AUTO_REUSE):
            conv3 = tf.layers.conv2d(conv2, 1024, 30, strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv4_red', reuse=tf.AUTO_REUSE):
            conv4 = tf.layers.conv2d(conv3, 64, 1, strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv5_up', reuse=tf.AUTO_REUSE):
            conv5 = tf.layers.conv2d_transpose(conv4, 32, (128, 128), strides=1, activation=tf.nn.relu)
        with tf.variable_scope('conv6_1x1', reuse=tf.AUTO_REUSE):
            conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu)
        return conv6
init_optimizer() snipped:
def init_optimizer(self):
    self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
    optimizer = tf.train.AdamOptimizer(learning_rate=self._learning_rate_placeholder)
    self._train_op = optimizer.minimize(self._loss_op)
 
    