If you don't need to reference the labels from outside the %rep block, the within-a-macro local %%label syntax can work:
%macro jmpfwd 0
times 21 nop
jmp %%fwd ;;;;; <<<------ This jump
add ax, 0x1234 ; can this stall decoding?
; lea eax, [ebx+edx+1]
align 64
%%fwd: ;;;;; <<<------ jumps here
%endmacro
Then use that macro inside a %rep
.looptop:
%rep 4
jmpfwd
%endrep
; times 4 jmpfwd nope, TIMES only works on (pseudo)instructions, not macros
dec ecx
jnz .looptop
(Turns out, Skylake can decode this without LCP stalls every iteration, only a few LCP stalls when the add hits the decoders in the same group as jmp before branch prediction for the unconditional jmp instructions take effect. The times 21 nop prevents it from fitting in the uop cache.)