A macro is a compile-time substitution, unlike a runtime function call. asm and C are different languages, so the only way this question makes sense is for asm macros that you can use from inline-asm.
gcc's asm output has to be assembled by GAS or a compatible assembler that understands GAS directives. (https://sourceware.org/binutils/docs/as/). Inline asm lets you emit hand-written stuff directly into that asm compiler output, becoming part of one complete assembler source file that the compiler feeds to the assembler.
Using NASM syntax like %macro can't work in GNU C inline asm, because an assembler that can assemble regular gcc output won't understand NASM directives.
But you can use GAS .macro if you want. (https://sourceware.org/binutils/docs/as/Macro.html). I wouldn't recommend it; GAS macros aren't very nice to use. The syntax feels clunky compared to NASM. But since you asked, this is how you do it.
asm(".include \"macro-defs.S\""); at the top of a C will let you use those macros from inline asm later in that compilation unit. (Assuming gcc doesn't reorder things in the output asm.)
But of course you have to know what the macro does to be able to write correct constraints for the inline-asm statements, so it's really not super-useful.
Example
macro-defs.S (GAS syntax, not NASM). Maybe I should have called it .s, because we only .include it with asm directives, not #include with the C preprocessor. (That would be problematic for C: you can't #include something inside a double-quoted string.) So anyway, we can't use CPP macros here, only asm macros.
#.altmacro # needed for some things, makes other things harder
# https://stackoverflow.com/questions/19776992/gas-altmacro-macro-with-a-percent-sign-in-a-default-parameter-fails-with-oper
# clobbers RDX and RAX
.macro fenced_rdtsc64 dst
lfence # make sure earlier stuff is done
rdtsc
lfence # don't allow later stuff to start before time is read
shl $32, %rdx # allow OoO exec of these with the timed interval
lea (%rax, %rdx), \dst
.endm
# repeats pause n times. Probably not useful, just a silly example.
# for exponential backoff in a spinloop, you want a *runtime* repeat count.
.macro pause_n count
pause # the machine instruction, not a macro
.if \count-1
pause_n "(\count-1)" # recursion is GAS equivalent of NASM %rep
.endif
.endm
These macros are usable from foo.S:
.include "macro-defs.S"
# inefficient: the subtraction really only needs to use the low 32 bits of the count
# so using a macro that merges the high half is a waste
.globl foo
foo:
fenced_rdtsc64 %rcx # start
pause_n 4
fenced_rdtsc64 %rax # end
sub %rcx, %rax
ret
And via inline-asm from main.c (which also calls foo() the normal way).
#include <stdio.h>
asm(".include \"macro-defs.S\"");
long long foo(void);
int main(void) {
long long start, end;
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (start)
:
: "rax", "rdx" // forces it to avoid these as output regs, unfortunately
);
printf("foo rdtsc ticks: call1 %lld call2 %lld\n", foo(), foo());
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (end)
:
: "rax", "rdx");
printf("printf rdtsc ticks: %lld\n", end-start);
}
Compile with gcc -O3 -Wall main.c foo.S (I used gcc7.3, with -fpie being the default).
Running it with for i in {1..50};do ./a.out;done gives output like this (on my i7-6700k, where pause takes ~100 core clock cycles, and hardware P-states ramp up the speed quickly when there's load):
... (variable number of lines before the frequency shift)
foo rdtsc ticks: call1 3006 call2 3014
printf rdtsc ticks: 727810
foo rdtsc ticks: call1 3006 call2 3022
printf rdtsc ticks: 707376
foo rdtsc ticks: call1 3006 call2 3017
printf rdtsc ticks: 746375
foo rdtsc ticks: call1 3006 call2 3029
printf rdtsc ticks: 684239
foo rdtsc ticks: call1 3006 call2 3010
printf rdtsc ticks: 652724
foo rdtsc ticks: call1 616 call2 620 # gcc chose to evalute from right to left
printf rdtsc ticks: 133282
foo rdtsc ticks: call1 618 call2 618 # so call1 is with it hot in uop cache
printf rdtsc ticks: 133984
foo rdtsc ticks: call1 616 call2 618
printf rdtsc ticks: 133284
foo rdtsc ticks: call1 614 call2 618
The asm for foo, if we disassemble (with objdump -drwC -Mintel a.out) to see how the macro expanded:
# I maybe should have used AT&T syntax disassembly like the source
# You can do that if you want, on your own desktop, leaving out -Mintel
00000000000006ba <foo>:
6ba: 0f ae e8 lfence
6bd: 0f 31 rdtsc
6bf: 0f ae e8 lfence
6c2: 48 c1 e2 20 shl rdx,0x20
6c6: 48 8d 0c 10 lea rcx,[rax+rdx*1] # macro expanded with RCX
6ca: f3 90 pause # pause_n 4 expanded to 4 pause instructions
6cc: f3 90 pause
6ce: f3 90 pause
6d0: f3 90 pause
6d2: 0f ae e8 lfence
6d5: 0f 31 rdtsc
6d7: 0f ae e8 lfence
6da: 48 c1 e2 20 shl rdx,0x20
6de: 48 8d 04 10 lea rax,[rax+rdx*1] # macro expanded with RAX
6e2: 48 29 c8 sub rax,rcx
6e5: c3 ret
The compiler-generated asm (including our inline asm) is:
0000000000000540 <main>:
540: 55 push rbp
541: 53 push rbx
542: 48 83 ec 08 sub rsp,0x8
546: 0f ae e8 lfence # first inline asm
549: 0f 31 rdtsc
54b: 0f ae e8 lfence
54e: 48 c1 e2 20 shl rdx,0x20
552: 48 8d 1c 10 lea rbx,[rax+rdx*1] # The compiler picked RBX for the output operand
# and substituted fenced_rdtsc64 %rbx into the asm template
556: e8 5f 01 00 00 call 6ba <foo>
55b: 48 89 c5 mov rbp,rax # save the return value, not a macro so it couldn't ask for a more convenient register
55e: e8 57 01 00 00 call 6ba <foo>
563: 48 89 ea mov rdx,rbp
566: 48 8d 3d 0b 02 00 00 lea rdi,[rip+0x20b] # 778 <_IO_stdin_used+0x8> # the string literal
56d: 48 89 c6 mov rsi,rax
570: 31 c0 xor eax,eax
572: e8 b9 ff ff ff call 530 <printf@plt>
577: 0f ae e8 lfence # 2nd inline asm
57a: 0f 31 rdtsc
57c: 0f ae e8 lfence
57f: 48 c1 e2 20 shl rdx,0x20
583: 48 8d 34 10 lea rsi,[rax+rdx*1] # compiler picked RSI this time
587: 48 8d 3d 1a 02 00 00 lea rdi,[rip+0x21a] # 7a8 <_IO_stdin_used+0x38>
58e: 48 29 de sub rsi,rbx # where it wanted it as the 2nd arg to printf(.., end-start)
591: 31 c0 xor eax,eax
593: e8 98 ff ff ff call 530 <printf@plt>
598: 48 83 c4 08 add rsp,0x8
59c: 31 c0 xor eax,eax
59e: 5b pop rbx
59f: 5d pop rbp
5a0: c3 ret