On Cortex-A processors (AArch64 mode) is there some rule of a thumb for optimization for speed? Like it's always better to read from memory, than do a branch?
Consider the simplest conversion to hexadecimal string as example:
convert:
    . . .
    cmp x9, 9
    b.le . + 8
    add x9, x9, 0x07
    add x9, x9, 0x30
    strb w9, [x10, -1]!
    . . .
    b convert
vs
convert:
    . . .
    ldrb w9, [x11, x9]    ; x11 - ptr to alphabet string: "0123456789ABCDEF"
    strb w9, [x10, -1]!
    . . .
    b convert
Thanks in advance for any tips.
