3

Example: 0xAABBCCDD will turn into 0xDDCCBBAA

My program crashes, due to Access Violation exception right in the first XOR operation.

It seems like there's a better naive solution, using shifting or rotating, but anyways, here's the code:

  ;; #########################################################################

      .486
      .model flat, stdcall
      option casemap :none   ; case sensitive

;; #########################################################################

      include \masm32\include\masm32.inc
      include \masm32\include\kernel32.inc

      includelib \masm32\lib\kernel32.lib
    includelib \masm32\lib\masm32.lib


.code
;; The following program will flip the sequence of the bytes in the eax
;; example : 0xAABBCCDD will turn into 0xDDCCBBAA
start:
MOV eax, 0AABBCCDDh 
XOR BYTE PTR [eax], al ;; Swap first byte and last byte
XOR al, BYTE PTR [eax]
XOR BYTE PTR [eax], al 
XOR BYTE PTR [eax+1], ah ;; Swap 2nd byte of eax and 3rd byte
XOR ah, BYTE PTR [eax+1]
XOR BYTE PTR [eax+1], ah
end_prog:
    ;;Exit the program, eax is the exit code
    push eax
    call ExitProcess
END start

What am I doing wrong here? Is there any better solution for this?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
idish
  • 3,190
  • 12
  • 53
  • 85

3 Answers3

17

Why not simply:

 mov  eax, 0AABBCCDDh
 bswap eax

I am not sure what you are trying to do in your program, but can say what the CPU actually tries to do (but can't and that is why crashes):

This one:

XOR BYTE PTR [eax], al 

Tries to compute an xor operation of the value in the register AL (byte sized) and a value of the byte in memory at address 0AABBCCDDh (the content of EAX register). As long as on this address there is no any memory allocated by the OS, the program crashes with GPF.

The proper byte swapping without using bswap is the following (Thanks to X.J):

    xchg  ah, al
    ror   eax, 16
    xchg  ah, al.
Community
  • 1
  • 1
johnfound
  • 6,857
  • 4
  • 31
  • 60
  • 1
    Because I never knew there's such opcode :) and anyway, can you see why the program crashes? or if there's another solution without using the bswap opcode? – idish Oct 14 '13 at 23:22
  • I still didn't really understand what's wrong. I was trying to make an XOR operation between 0xAA and 0xDD. (first byte and last byte) – idish Oct 14 '13 at 23:34
  • But you are trying to do XOR operation between the DD (in AL) and some memory byte at address AABBCCDD. Are you understand the difference between the memory and registers? Also note, in x86 you simply don't have byte access to the upper 16 bits of the registers. You can for example make XOR AH, AL (no square brackets!) and it will make XOR 0CCh, 0DDh but there is no byte registers for the upper part of EAX. – johnfound Oct 14 '13 at 23:38
  • How come you can't access the upper part of EAX? How can I do the following then? : MOV BYTE PTR [eax],0h. Wouldn't it set the upper byte of eax to 0x00? If there was no such opcode such as bswap, how could you swap the bytes? As you said, we can't access the upper part of EAX. I'm sorry for all these questioning, I'm a total beginner of ASM – idish Oct 14 '13 at 23:45
  • Again, mov byte ptr [eax], 0h will write 0h at some address in memory, not in the upper byte of EAX. The address where this 0h will be written is the value of the register EAX. If there was no bswap, the only way to swap bytes is to use more than 1 register and to work with the lower 16bits (that are accessible as AH/AL byte registers and respectively BH/BL, CH/CL and DH/DL). – johnfound Oct 14 '13 at 23:51
  • Ahh, I understand now, thanks for your explanation. Actually, it's kind of an homework exercise, and I wasn't suppose to use the bswap opcode (I guess). I still don't understand how you can do it using the low AH/AL and other 16 bits registers . Could you post some code please? I know I'm asking for too much regarding this question.. :) – idish Oct 15 '13 at 00:03
  • @idish - I am kind today. :) The answer is edited, read there. – johnfound Oct 15 '13 at 00:12
  • 1
    As an improvement once can also use ROR like this: xchg ah,al;ror eax,16;xchg ah;al. – X.J Oct 15 '13 at 00:18
  • @X.J - Yes, of course I need some sleep and think so slow. :) May I use your solution? – johnfound Oct 15 '13 at 00:19
  • 1
    modern x86 CPUs has [MOVBE](https://www.felixcloutier.com/x86/movbe) and you don't even need MOV then BSWAP – phuclv Oct 09 '20 at 16:07
  • For performance, `rol ax, 16` is somewhat better than `xchg al,ah` on modern CPUs. (e.g. 1 uop instead of 3, on Intel Sandybridge-family, and doesn't get AH renamed separately from RAX so it won't need an AH merge uop to issue in a cycle by itself when you next read EAX). Oh, I just scrolled down and there's already an answer pointing this out, and I commented the same thing on it 2 years ago. >. – Peter Cordes Mar 06 '21 at 03:07
4

An alternative solution, using the rol instruction only:

mov eax,0xAABBCCDDh
rol ax,8            ; 0AABBDDCCh
rol eax,16          ; 0DDCCAABBh
rol ax,8            ; 0DDCCBBAAh

I believe, in most cases, this will be ever so slightly faster than using the xchg instruction, although I see no reason not to simply use bswap, which is cleaner and likely faster.

user3783243
  • 5,368
  • 5
  • 22
  • 41
Andrew Hardiman
  • 929
  • 1
  • 15
  • 30
  • 3
    Yes, this is slower than `bswap`, but much faster than `xchg` on modern CPUs, because `rol reg,imm` is a single uop (https://agner.org/optimize/), and because it avoids writing AH ([How exactly do partial registers on Haswell/Skylake perform?](https://stackoverflow.com/q/45660139)) so the partial-register merging is cheaper or non-existent on Sandybridge-family. The `xchg`-based version might be better on a 386 (smaller code size). Use this you need compat with ancient CPUs (`bswap` needs 486), but care about modern CPUs where `xchg` is multi-uop. – Peter Cordes Aug 21 '18 at 18:49
2

How 'bout...

    mov eax, 0AABBCCDDh
    xchg al, ah ; 0AABBDDCCh
    rol eax, 16 ; 0DDCCAABBh
    xchg al, ah ; 0DDCCBBAAh

Would that not do what is wanted in one register? I see X.J has already posted that (rotate left, rotate right - same result) Gotta be quick to beat you guys! :)

Laurent Meyer
  • 2,766
  • 3
  • 33
  • 57
Frank Kotler
  • 3,079
  • 2
  • 14
  • 9
  • Wow, this a great solution, thank you! I will +1 you since I already accept John's answer XD – idish Oct 15 '13 at 00:31
  • 1
    This is a pretty cool solution, but keep in mind that it might be slower. Using partial registers is often very expensive as most x86 CPUs don't maintain different registers, and have to go with an expensive merge on each partial write. – Leeor Oct 15 '13 at 11:34