Possible solution
IMO you should use almost standard categorical_crossentropy and output logits from the network which will be mapped in loss function to values [0,1,2,3,4] using argmax operation (same procedure will be applied to one-hot-encoded labels, see last part of this answer for an example).
Using weighted crossentropy you can treat incorrectness differently based on the predicted vs correct values as you said you indicated in the comments. 
All you have to do is to take absolute value of subtracted correct and predicted value and multiply it by loss, see example below:
Let's map each encoding to it's unary value (can be done using argmax as later seen):
[0, 0, 0, 0] -> 0
[1, 0, 0, 0] -> 1
[1, 1, 0, 0] -> 2
[1, 1, 1, 0] -> 3
[1, 1, 1, 1] -> 4
And let's make some random targets and predictions by the model to see the essence:
   correct  predicted with Softmax
0        0                       4
1        4                       3
2        3                       3
3        1                       4
4        3                       1
5        1                       0
Now, when you subtract correct and predicted and take absolute you essentially get weighting column like this:
   weights
0        4
1        1
2        0
3        3
4        2
5        1
As you can see, prediction of 0 while true target is 4 will be weighted 4 times more than prediction of 3 with the same 4 target and that is what you want essentially IIUC.
As Daniel Möller indicates in his answer I would advise you to create a custom loss function as well but a little simpler:
import tensorflow as tf
# Output logits from your network, not the values after softmax activation
def weighted_crossentropy(labels, logits):
    return tf.losses.softmax_cross_entropy(
        labels,
        logits,
        weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
    )
And you should use this loss in your model.compile as well, I think there is no need to reiterate points already made.
Disadvantages of this solution:
- For correct predictions gradient will be equal to zero, which means it will be harder for network to strengthen connections (maximize/minimize logits towards +inf/-inf)
- Above can be mitigated by adding random noise (additional regularization) to each weighted loss. Would act as a regularization as well, might help.
- Better solution might be to exclude from weighting case where predictions are equal (or make it 1), it would not add randomization to network optimization.
Advantages of this solution:
- You can easily add weighting for imbalanced dataset (e.g. certain classes ocuring more often)
- Maps cleanly to existing API
- Simple conceptually and remains in classification realm
- Your model cannot predict nonexistent classification values, e.g. with your multitarget case it could predict [1, 0, 1, 0], there is no such with approach above. Less degree of freedom would help it train and remove chances for nonsensical (if I got your problem description right) predictions.
Additional discussion provided in the chat room in comments
Example network with custom loss
Here is an example network with the custom loss function defined above.
Your labels have to be one-hot-encoded in order for it to work correctly.
import keras    
import numpy as np
import tensorflow as tf
# You could actually make it a lambda function as well
def weighted_crossentropy(labels, logits):
    return tf.losses.softmax_cross_entropy(
        labels,
        logits,
        weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
    )
model = keras.models.Sequential(
    [
        keras.layers.Dense(32, input_shape=(10,)),
        keras.layers.Activation("relu"),
        keras.layers.Dense(10),
        keras.layers.Activation("relu"),
        keras.layers.Dense(5),
    ]
)
data = np.random.random((32, 10))
labels = keras.utils.to_categorical(np.random.randint(5, size=(32, 1)))
model.compile(optimizer="rmsprop", loss=weighted_crossentropy)
model.fit(data, labels, batch_size=32)