There are various ways to approach this problem, but a simple formulation might be to design a scoring function for any given group of animals based on your data, and then perform a numerical optimization such as simulated annealing to find the partition of animals into groups that approximately maximizes your total score. Or if the number of animals is small enough you can just do an exhaustive search of all partitions.
You should choose your scoring function carefully ensure that you don't end up with n groups of size 1 or 1 group of size n. And don't forget to respect symmetry.
You could start by computing the probabilities of each pair of animals appearing together, then scale the set of all probabilities to have zero mean, and then score each group G as the sum of the pairwise scaled scores:

This is just a first try, you should be able to come up with better scoring functions.
Then to apply simulated annealing for k timesteps:
Choose a random partition π
for i = 0 to k:
T = i/k #floating point division
make a random transition to partition π'
if P_accept(π, π', T) > rand(0,1):
π <- π'
return π
Where a random transition is a swap of one animal from one group to another, including into a new empty group.
P_accept is the acceptance probability function you must design as described in the simulated annealing article. This should be based on the scores of two partitions and the temperature. The score of a partition could be the sum of the scores of each group in the partition, for example. For more info on designing an acceptance probability function see here.
Notice that you don't actually need an absolute score of a partition to run simulated annealing. You could do with a function that compares one partition to another. There are a few ways you might design such a function, but if you want to bring out the big guns you can consider using a Generalized Bradley Terry Model [pdf]. You can train on your input data to get a numerical parameter γ for each animal with the property that:

For example. This should give you a much better measure of group desirability, and it should fit much more nicely into the simulated annealing framework!