Why might one use the xzr register instead of the literal 0 on ARMv8?

Question

I was reading the SVE whitepaper from ARM and came across something that struck me as odd (in a non-SVE example):

mov x8, xzr

I didn't know what this xzr register was, so I looked it up and found some content from ARM stating that it was, in many contexts, synonymous with zero.

So it looks like x8 is being initialised to zero, which makes sense because it's executed just before a loop where x8 is used as the loop counter.

What I don't understand is, why wasn't the literal 0 used instead of xzr? For example:

mov x8, 0

To summarise, my question is: why might one use the xzr register instead of the literal 0 here?

Close voter, this is not opinion based :) There are objective differences between how they're encoded at the very least. — OMGtechy, Mar 14 '17 at 14:52
Where register number 31 represents read zero or discard result (aka the “zero register”): .... For instruction operands which interpret register 31 as the zero register, it is represented by the name XZR in 64-bit contexts, and WZR in 32-bit contexts. — InfinitelyManic, Mar 14 '17 at 15:24
E.g., discard results "ldr xzr, [sp], 16". Also remember that in ARMv8 the stack must be quad-word aligned or SP mod 16 = 0. So using xzr may be used if as one of the "pushed" or "popped" registers. — InfinitelyManic, Mar 14 '17 at 15:28
@InfinitelyManic thanks! It would be good if you either added an answer or edited an existing one to add this information, as people might miss it buried in the comments here. :) — OMGtechy, Mar 14 '17 at 15:48

score 23 · Accepted Answer · answered Mar 14 '17 at 19:18

I think the mov x8, xzr vs mov x8, #0 comparison is something of a red herring.

As @old_timer's answer shows, there is no encoding gain to be made, and quite likely (although admittedly I haven't checked) little or no pipeline performance gain.

What xzr gives us, however - in addition to a dummy register as per @InfinitelyManic's answer - is access to a zero-valued operand without having to load and occupy a real register. This has the dual benefit of one less instruction, and one more register available to hold 'real' data.

I think this is an important characteristic that the original 'some content from ARM' referred to in the OP neglects to point out.

That's what I mean by mov x8, xzr vs mov x8, #0 being a red herring. If we're zeroing x8 with the intention of then modifying it, then using xzr or #0 is pretty arbitrary (although I'd tend to favour #0 as the more obvious). But if we're zeroing x8 purely in order to supply a zero operand to a subsequent instruction, then we'd be better off using - where permitted - xzr instead of x8 as the operand in that instruction, and not zeroing x8 at all.

A little late to the party, having stumbled upon this just now, but I recall that the difference may matter a tiny tiny bit in the rare case one of them happens to not be dependency chain breaking. For example, on the Apple M1, the wzr/xzr way is apparently not dependency chain breaking (I'd need to find the source). — Mona the Monad, Jan 22 '23 at 15:22

score 6 · Answer 2 · edited Mar 15 '17 at 10:55

mov x8,xzr
mov x8,#0
mov x8,0

produces

0000000000000000 <.text>:
   0:   aa1f03e8    mov x8, xzr
   4:   d2800008    mov x8, #0x0                    // #0
   8:   d2800008    mov x8, #0x0                    // #0

No real surprise there other than it allowed an immediate without the pound sign. It is not an instruction size issue (again no surprise, with x86 for example xor rax,rax is cheaper than mov rax,0), perhaps there is a pipeline performance gain (despite popular belief instructions take more than one clock start to finish).

Most likely it is a personal preference thing we have this cool mips like always zero register thing lets use it just for fun.

score 1 · Answer 3 · answered Mar 14 '17 at 14:50

1

These two instructions should be identical - both in terms of effect and expected performance.

They're actually both aliases of more general purpose instructions.

mov x8, 0 is encoded as orr x8, xzr, 0

mov x8, xzr is encoded as orr x8, xzr, xzr

Aliases are useful because they make the ASM more readable.

The second encoding demonstrates why having a zero register xzr can be useful. Because we know xzr is always zero, we can reuse the orr instruction for mov. Without it, mov would require a different encoding, and so would waste encoding space.

answered Mar 14 '17 at 14:50

Will Lovett

11
1

mov #0 already has a different encoding, and there are other equivalent instructions, so the encoding space issue doesn't really count for much. – Jeremy Mar 14 '17 at 18:42
Perhaps I wasn't clear. The existance of `xzr` allows both `mov #immediate` and `mov xN` to be aliases to two more general instructions (`reg || immediate` and `reg || reg`). Without `xzr`, neither of these aliases would be possible. – Will Lovett Mar 16 '17 at 15:37
The instruction `orr x8, xzr, #0` doesn't exist - as the [relevant ARM64 documentation](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/ORR_log_imm.html) states: _Because the all-zeros and all-ones values cannot be described in this way, the assembler generates an error message._ – Jeremy Mar 17 '17 at 13:08
1

But you're correct that [register-to-register moves](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802b/MOV_ORR_log_shift.html) are implemented as `ORR xd, xzr, xm`. – Jeremy Mar 17 '17 at 13:17
1

d2800008 mov x8, #0x0 – Victor Signaevskyi Sep 29 '17 at 10:16
1

aa1f03e8 mov x8, xzr – Victor Signaevskyi Sep 29 '17 at 10:16
As it can be seen, opcodes for "mov x8, #0x0" and "mov x8, xzr" are different. – Victor Signaevskyi Sep 29 '17 at 10:17

score 0 · Answer 4 · edited Mar 14 '17 at 15:03

0

TL;DR

It takes multiple instructions to load a 64 bit literal into a register, but only a single instruction to set to 0 using zxr. Therefore the code is shorter and faster.

To move a literal to a register you would use the MOVL instruction, see this from the arm reference:

MOVL pseudo-instruction

Load a register with either:
A 32-bit or 64-bit immediate value.

Any address.
MOVL generates either two or four instructions... a MOV, MOVK pair.

So loading a literal into a register is a multi step process. If you just want to clear a register then they have a shortcut. zxr is a pseudo register that always reads zero, which is a common value that you would need, and moving a register to a register can be done in a single instruction.

In Microchip assembly they have a similar concept. To set a register to a literal you would do something like:

MOVLW   10       (Move 10 to the working register) 
MOVWF   0x1234   (Move the working register to address 0x1234)

But to set to zero they have the instruction:

CLRF    0x1234   (Set 0x1234 to zero)

edited Mar 14 '17 at 15:03

OMGtechy

7,935
8
48
83

answered Mar 14 '17 at 14:39

silverscania

647
1
6
16

Is there a benefit to this over `EOR x8, x8, x8` – Colin Mar 14 '17 at 15:07
2

@Colin__s `EOR` keeps dependencies, so it can't be executed out of order on a CPU that *can* OoO execute the `mov` from `xzr`. – EOF Mar 14 '17 at 15:23
@EOF Ah, I see. Thanks. – Colin Mar 14 '17 at 15:24
1

The ARM reference also says: "Use the MOVL pseudo-instruction to: Generate literal constants **when an immediate value cannot be generated in a single instruction**." As old_timer's answer shows, immediate zero load can be performed in a single instruction. – Jeremy Mar 14 '17 at 16:03

score 0 · Answer 5 · answered Mar 14 '17 at 16:17

This answer is not "on all fours" to the OP.

XZR could be used to discard results; e.g., "ldr xzr, [sp], 16". See GDB below

0x7fffffef40:   0x00000000      0x00000000      0x00400498      0x00000000
0x7fffffef50:   0x00000000      0x00000000      0x00000000      0x00000000
              ldr x0,=0xdead
(gdb)
              ldr x1,=0xc0de
(gdb)
              stp x0, x1, [sp, #-16]!
(gdb) x/8x $sp
0x7fffffef30:   0x0000dead      0x00000000      0x0000c0de      0x00000000
0x7fffffef40:   0x00000000      0x00000000      0x00400498      0x00000000

              ldr xzr, [sp], #16
(gdb) x/8x $sp
0x7fffffef40:   0x00000000      0x00000000      0x00400498      0x00000000
0x7fffffef50:   0x00000000      0x00000000      0x00000000      0x00000000

Also remember that in ARMv8 the stack should be quad-word aligned or SP mod 16 = 0. So you may use XZR one of the "pushed" or "popped" pair registers.

stp x1, xzr, [sp, #-16]!

ldp x10, xzr, [sp], #16

@OMGtechy - sorry that's a legal term; loosely meaning that my answer/response, etc. doesn't precisely match your original question. — InfinitelyManic, Mar 14 '17 at 16:43

Why might one use the xzr register instead of the literal 0 on ARMv8?

5 Answers5