In the following code, I can get the result of mm0 - mm1 in mm0 by PSUBSW instruction. When I compiled on Mac book air by gcc.
But, PSUBSW instruction is explained that we can get the result of mm1 - mm0 in mm1 in Intel developer's manual: PSUBSW mm, mm/m64, Subtract signed packed words in mm/m64 from signed packed words in mm and saturate results.
#include <stdio.h>
int
main()
{
  short int a[4] = {1111,1112,1113,1114};
  short int b[4] = {1111,2112,3113,4114};
  short int c[4];
  asm volatile (
  "movq (%1),%%mm0\n\t"
  "movq (%2),%%mm1\n\t"
  "psubsw %%mm1,%%mm0\n\t"
  "movq %%mm0,%0\n\t"
  "emms"
  : "=g"(c): "r"(&a),"r"(&b));
  printf("%d %d %d %d\n", c[0], c[1], c[2], c[3]);
  return 0;
}
What is this difference? Which is the src, mm0 or mm1? If this difference is Intel syntax and AT&T syntax.
 
     
    