Let's start with the basic, how to do the maths.
We'll go with very simple maths.
When you have to list all possible combinations of n distinct objects, we should first define a path to do so. At first, we could consider the combinations with one, two and three objects (n ∈ {1,2,3}), let's go with the objects A, B and C :
- For only one object, A, we have only one possible combination, A 
- For two objects, A and B, we have two possible combinations, [ A, B ] and [ B, A ] 
- With three objects, we have then six possible combinations, - [ A, B, C ] [ A, C, B ] - [ B, A, C ] [ B, C, A ] - [ C, A, B ] [ C, B, A ] 
What do we see here ?
- That the number of possible combinations depends directly from the number of objects
- From one object to two objects, we double the number of combinations (1 x 2)
- From two objects to three objects, we triple the number of combinations (2 x 3)
If we take the combinations for three objects, we can clearly see that it is a combinations between object C and the combinations of two objects :
     [ A, B, C ] [ A, C, B ]
     [ B, A, C ] [ B, C, A ]
    [ C, A, B ] [ C, B, A ]
So we can see than incrementing n gives us n x ( n - 1 number of possible combinations). Consequently, the number of possible combinations for any n is factorial(n). If we look closely to the combinations of three objects, we can guess a path for an adequate algorithm to produce all n combinations :
     [ A, B, C ]
     [ B, A, C ]
     [ A, C, B ]
     [ B, C, A ]
     [ C, A, B ]
     [ C, B, A ]
We can see a pattern here, we have the two initial combinations [ A, B ] and [ B, A ] to which C is added, then we have at the same two preceding combinations but this time the second and the third elements are switched ( C is in the "middle" ), and , at new the same two preceding combinations but this time the first and the second elements are switched ( C is at the "start" ). We can then repeat the same pattern with four objects:
- Simply add D to the six initial combinations.
- Then take this six resulting combinations and add them at new with a switch of their third and fourth element.
- Then take this six resulting combinations and add them at new with a switch of their second and third element.
- Then take this six resulting combinations and add them at new with a switch of their first and second element.
The pseudo code we can asses from that, with all of our objects in an "objects" collection :
Set n to 0
Create empty collection "combinations"
For each object in "objects"
  If n > 0 Then
     Include current object in all combinations of "combinations"
     Set indice to n
     Copy "combinations" to new collection "local combinations"
     While indice > 0
        Create empty collection "new combinations"
        For each combination in "local combinations"
          Create combination witch switch of indice + 1 and indice element of current combination in "new combinations"
        decrement indice 
        Add all combinations of "new combinations" to "combinations"
        If Indice > 0 Then Replace the content of "local combinations" with all combinations of "new combinations" 
  Else
     Create combination with the current object in "combinations"
  Increment n
We have here an estime complexity of O(Factorial(n)) as we iterate n times and with each iteration we iterates n - 1 times, with n as the number of objects.
Before implementing that in assembly, let's go with C for clarity :
#include <stdio.h>
// There is no standard factorial function in C, lets write a basic one. 
// Considering its limited range, note that a table of known factorials would 
// be far better for perfomance 
unsigned int factorial(unsigned int pNumber) {
   unsigned int lResult = pNumber;
   if (pNumber > 7) {
      // We should not compute that above int32 capacity (max fact 12) and sould not take more than a few kilobytes on the stack, so 7 at max.
      return 0; // If output < pNumber that's mean the function failed 
   }
   while(--pNumber > 0) lResult *= pNumber;
   return lResult;
}
int main(int nbargs, char *args[]) {
  // First, define our number objects, we're going with 4 as an example
  unsigned int number_of_objects = 4;
  // Set n to 0 : Our expected number of objects at the end
  unsigned int n = 0;
  // Create empty collection "combinations" : we need to store "factorial(number_of_objects) * number_of_objects" unsigned int in the end, we take that on the stack
  unsigned int combinations_number = 0;
#ifdef _MSC_VER
  unsigned int all_combinations[32000][6];
#else
  unsigned int all_combinations[factorial(number_of_objects)][number_of_objects];
#endif
  // For each object in "objects" : We have to loop on our objects, as it is simply successive integers, we go for a simple loop 
  for (unsigned int object = 1; object <= number_of_objects; object++) {
    // If n > 0 Then : 
    if (n > 0) {
      // Include current object in all combinations of "combinations" : 
      unsigned int indice;
      for (indice = 0; indice < combinations_number; indice++) {
        all_combinations[indice][n] = object;
      }
      // Set indice to n :
      indice = n;
      // Copy "combinations" to new collection "local combinations" :
      unsigned int start = 0;
      unsigned int combinations_number_to_add = combinations_number;
      // While indice > 0
      while(indice > 0) {
        // Create empty collection "new combinations" :
        // For each combination in "local combinations" :
        for (unsigned int local_indice = start; local_indice < start + combinations_number_to_add; local_indice++) {
           // Create combination witch switch of indice + 1 and indice element of current combination in "new combinations" :
           for (unsigned int copy_indice = 0; copy_indice <= n ; copy_indice++) {
             all_combinations[combinations_number][copy_indice] = all_combinations[local_indice][copy_indice];
           }
           all_combinations[combinations_number][indice - 1] = all_combinations[local_indice][indice];
           all_combinations[combinations_number][indice] = all_combinations[local_indice][indice - 1];
           combinations_number++;
        }
        // decrement indice  :
        indice--;
        // Add all combinations of "new combinations" to "combinations" :
        start+=combinations_number_to_add;
        // If Indice > 0 Then Replace the content of "local combinations" with all combinations of "new combinations"  :
      };
    // Else :
    } else {
      // Create combination with the current object in "combinations" : Simply store the "object" in all_combinations 
      all_combinations[combinations_number++][0] = object;
    }
    // Increment n:
    n++;
  }
  // Simple display
  printf("%d \n", combinations_number);
  for (n = 0; n < combinations_number; n++) {
    for (unsigned int nb = 0; nb < number_of_objects; nb++) {
      printf("%d ", all_combinations[n][nb]);
    }
    printf("\n");
  }
}
As you can see, we have eliminate here the need of a "local/new" collections with simple array maths, as we simply have integers as "objects".
As i shouldn't do all your homework, i let you how to adapt this C code in MSVC ASM.
At least try.