When I compare the stack usage provided by gcc -fstack-usage option of two similar functions, one with a function call and another one without, I have different results.
Let's consider this piece of code:
void donothing(void) {}
void leaf(void) {
int i = 0;
}
void noleaf(void) {
int i = 0;
donothing();
}
int main(void) {
leaf();
noleaf();
return 0;
}
I would like to compare the stack usage of leaf and noleaf. Naively, I would say the stack size is equal for both functions as they have the same local variables. However, when I use -fstack-usage option of gcc, I have different results:
$ gcc -fstack-usage -o exe leaf.c
$ cat leaf.su
leaf.c:1:6:donothing 16 static
leaf.c:3:6:leaf 16 static
leaf.c:7:6:noleaf 32 static
leaf.c:12:5:main 16 static
We can see that leaf has the same stack size than donothing. It means that the local variable is not taken into account.
When I look at the assembly code, I can see that the stack is not manipulated the same way between leaf and noleaf:
000000000000112c <leaf>:
112c: 55 push %rbp
112d: 48 89 e5 mov %rsp,%rbp
1130: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1137: 90 nop
1138: 5d pop %rbp
1139: c3 retq
000000000000113a <noleaf>:
113a: 55 push %rbp
113b: 48 89 e5 mov %rsp,%rbp
113e: 48 83 ec 10 sub $0x10,%rsp
1142: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1149: e8 d7 ff ff ff callq 1125 <donothing>
114e: 90 nop
114f: c9 leaveq
1150: c3 retq
In noleaf, we allocate the space on the stack with sub $0x10,%rsp, but, in leaf, the local variable is directly stored on the stack with movl $0x0,-0x4(%rbp) with no "preallocation".
However, even if no "preallocation" is done, the local variable of leaf is still on the stack, right? As the stack is also used for the local variable in leaf, I would expect to have a stack usage of 32 bytes for this function too. Could I say that the output of -fstack-usage is wrong?
EDIT: some comments and answers suggest the difference of stack usage is due to callq and the alignment requirements. Let's consider a modified version of the source file:
void donothing(void) {}
void leaf(void) {
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
}
void noleaf(void) {
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
{ int i = 0; int j = 0; int k = 0; int l = 0; }
donothing();
}
int main(void) {
leaf();
noleaf();
return 0;
}
If we agree that the difference of stack usage is only explained by the callq and the alignment, we should not observe a difference greater than 16 bytes between the two functions. However, here is the results of -fstack-usage:
su.c:1:6:donothing 16 static
su.c:3:6:leaf 16 static
su.c:10:6:noleaf 80 static
su.c:18:5:main 16 static
We can see that noleaf uses 80 bytes which seems normal to me (64 bytes for the local variables and 16 bytes for the stack pointers, saved rip and rbp). However, the stack size for leaf is still 16 bytes. The local variables are not taken into account.
Here is the assembly code:
000000000000112c <leaf>:
112c: 55 push %rbp
112d: 48 89 e5 mov %rsp,%rbp
1130: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
1137: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
113e: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
1145: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp)
114c: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
1153: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
115a: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
1161: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
1168: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
116f: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
1176: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
117d: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
1184: c7 45 cc 00 00 00 00 movl $0x0,-0x34(%rbp)
118b: c7 45 c8 00 00 00 00 movl $0x0,-0x38(%rbp)
1192: c7 45 c4 00 00 00 00 movl $0x0,-0x3c(%rbp)
1199: c7 45 c0 00 00 00 00 movl $0x0,-0x40(%rbp)
11a0: 90 nop
11a1: 5d pop %rbp
11a2: c3 retq
00000000000011a3 <noleaf>:
11a3: 55 push %rbp
11a4: 48 89 e5 mov %rsp,%rbp
11a7: 48 83 ec 40 sub $0x40,%rsp
11ab: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
11b2: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
11b9: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
11c0: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp)
11c7: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
11ce: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
11d5: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
11dc: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
11e3: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%rbp)
11ea: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
11f1: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
11f8: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
11ff: c7 45 cc 00 00 00 00 movl $0x0,-0x34(%rbp)
1206: c7 45 c8 00 00 00 00 movl $0x0,-0x38(%rbp)
120d: c7 45 c4 00 00 00 00 movl $0x0,-0x3c(%rbp)
1214: c7 45 c0 00 00 00 00 movl $0x0,-0x40(%rbp)
121b: e8 05 ff ff ff callq 1125 <donothing>
1220: 90 nop
1221: c9 leaveq
1222: c3 retq
Therefore, I think the callq statement is not enough to explain the difference of stack usage between the two functions.