Program compiled with -fPIC crashes while stepping over thread-local variable in GDB

Question

This is a very strange problem which occurs only when the program is compiled with -fPIC option.

Using gdb I'm able to print thread local variables but stepping over them leads to crash.

thread.c

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

#define MAX_NUMBER_OF_THREADS 2

struct mystruct {
    int   x;
    int   y;
};

__thread struct mystruct obj;

void* threadMain(void *args) {
    obj.x = 1;
    obj.y = 2;

    printf("obj.x = %d\n", obj.x);
    printf("obj.y = %d\n", obj.y);

    return NULL;
}

int main(int argc, char *arg[]) {
    pthread_t tid[MAX_NUMBER_OF_THREADS];
    int i = 0;

    for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
        pthread_create(&tid[i], NULL, threadMain, NULL);
    }

    for(i = 0; i < MAX_NUMBER_OF_THREADS; i++) {
        pthread_join(tid[i], NULL);
    }

    return 0;
}

Compile it using the following: gcc -g -lpthread thread.c -o thread -fPIC

Then while debugging it: gdb ./thread

(gdb) b threadMain 
Breakpoint 1 at 0x4006a5: file thread.c, line 15.
(gdb) r
Starting program: /junk/test/thread 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fc7700 (LWP 31297)]
[Switching to Thread 0x7ffff7fc7700 (LWP 31297)]

Breakpoint 1, threadMain (args=0x0) at thread.c:15
15      obj.x = 1;
(gdb) p obj.x
$1 = 0
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
threadMain (args=0x0) at thread.c:15
15      obj.x = 1;

Although, if I compile it without -fPIC then this problem doesn't occur.

Before anybody asks me why am I using -fPIC, this is just a reduced test case. We have a huge component which compiles into a so file which then plugs into another component. Therefore, fPIC is necessary.

There is no functional impact because of it, only that debugging is near impossible.

Platform Information: Linux 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux, Red Hat Enterprise Linux Server release 6.5 (Santiago)

Reproducible on the following as well

Linux 3.13.0-66-generic #108-Ubuntu SMP Wed Oct 7 15:20:27 
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

@alk Platform information = `Linux vm-kartika-vnc 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux ` — Kartik Anand, Oct 30 '15 at 06:49
Update to the latest gdb (build from sources if needed). If the problem persists, file a bug. You can also try to get support from RH if you are their paying customer. — n. m. could be an AI, Oct 30 '15 at 06:54
Same behaviour on Debian Wheezy (Linux debian-stable 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u5 x86_64 GNU/Linux) using gcc 4.7.2. — alk, Oct 30 '15 at 06:56
After fixing the `-lpthread` to be `-pthread`, my `gcc 4.8.4`/`gdb 7.7.1` runs this program without any problem. — EOF, Oct 30 '15 at 14:27
@KartikAnand No, it is *not* supposed to be `-lpthread`. See http://stackoverflow.com/a/1665110/50617 — Employed Russian, Oct 30 '15 at 15:27
@EmployedRussian If it is wrong then the program wouldn't even compile. The program is compiling correctly and working correctly as well. I'm specifying the library `pthread` the way it is used to be in `gcc` using `-lpthread`. I don't see anything wrong here. — Kartik Anand, Oct 30 '15 at 17:14
@KartikAnand "I don't see anything wrong" -- too bad: there are *plenty* things wrong in your command line. In addition to `-pthread` vs. `-lpthread` difference, your `-lpthread` is also in the wrong place on the command line. "compiling and working correctly" -- it's an accident. Lots of buggy programs *appear* to work correctly, until they don't. — Employed Russian, Oct 30 '15 at 18:47
@EOF: Interesting, trying "*fixing the `-lpthread` to be `-pthread`*" was he 1st I did when testing this on my platform (gcc 4.7.2, gdb 7.4.1) and it did *not* help. — alk, Oct 31 '15 at 08:25
@EmployedRussian and @EOF I still get the same `SIGSEGV` even after compiling with `-pthread`. And again only `-fPIC` make a difference — Kartik Anand, Oct 31 '15 at 09:27
Same behaviour here with gdb 7.9.1 (using -pthread), probably no sense in trying a newer version. — n. m. could be an AI, Oct 31 '15 at 13:45
@KartikAnand: I'm not sure but, have you tried `gcc -g thread.c -o thread -fPIC -pthread` ? _(I've noticed, on a couple of cases, that I had subtle bugs depending on the order/placement of linked libs)_ — maddouri, Nov 02 '15 at 15:44
It looks like gdb is finding the end of the `threadMain` prologue incorrectly. Compiling on Ubuntu 15, gcc 4.9.2, (with -pthread, not -lpthread), gdb 7.10 is placing a breakpoint at 0x4007e3, but the instruction (`mov %fs:0x0,%rax`) actually begins at 0x4007e2. — Mark Plotnick, Nov 02 '15 at 15:49
I've bisected a bit and it looks like that gdb got broken somewhere between 7.4.1 and 7.5 release. If you can, downgrade to 7.4.1 version. — ks1322, Nov 02 '15 at 17:13
@ks1322 I've access to `gdb 7.2`, the problem is still there. — Kartik Anand, Nov 03 '15 at 03:18
@MarkPlotnick Is there a workaround or any other open source debugger that can help here? — Kartik Anand, Nov 03 '15 at 03:19
it works fine for me with `clang 3.7.0` / `gdb 7.7.1`, while when compiling with `gcc 4.9.2` or `gcc 5.1.0` I experience the same problem. Might the "bug" be compiler related? — oo_miguel, Nov 03 '15 at 09:28
I can reproduce this with `gcc 4.8.1 / gdb 7.6.50`, and compilation with `gcc -g thread.c -o thread -pthread -fPIC` and `gcc -g thread.c -o thread -lpthread -fPIC` both give the same SHA1 over the binary, so this is not related to `-(l)pthread`. — Iwillnotexist Idonotexist, Nov 03 '15 at 15:17
@IwillnotexistIdonotexist yeah I was thinking the same. The issue only comes with `-fPIC` — Kartik Anand, Nov 03 '15 at 15:41
gdb's line table (`gdb.selected_frame().find_sal().symtab.linetable()`)agrees with the output of `objdump -WL`, but they are still off by one byte. The output of `gcc -g -S` looks correct, but the subsequent processing of TLS+fPIC instructions by the rest of the toolchain seems to be where the bug is. So using `clang` is probably the workaround for now. — Mark Plotnick, Nov 03 '15 at 16:35
On my ubuntu, by adding following code pieces, i get warning while compiling `#ifndef TLS` `#warning TLS is not enabled` `#endif` — sardok, Nov 03 '15 at 17:15
@MarkPlotnick the problem is it will be very difficult to integrate `clang` in the existing architecture. Our whole suite depends on `gcc` to build. And I don't think they will be willing to move to `clang` for just one component — Kartik Anand, Nov 03 '15 at 17:21
In my limited investigation so far I find that the problem is not the debugger. The binary is generated with incorrect debugging info that misidentifies the beginning of line `15` at `0x4007fd`, one byte too far. As a result, addresses for _all_ following lines are also one byte too far. GDB simply places the BP where it's been promised an instruction starts, and hits a BP with a `fs:` segment override. But when the step-over happens, because the restart PC is set past the segment override, the CPU does not decode it, accesses the wrong memory (`ds:`) and segfaults. — Iwillnotexist Idonotexist, Nov 03 '15 at 17:43
@IwillnotexistIdonotexist But then how is the temporary solution provided by Mark above your comment working? Won't the same problem happen there as well? — Kartik Anand, Nov 03 '15 at 18:10
I don't claim to understand precisely why the DWARF info was generated the way it was; I only claim to understand precisely what happens in the debugger, given the wrong DWARF info it is being provided. I personally suspect that there is a bug in the determination of the prologue's length when the first instruction after the prologue has a `fs:` prefix and is compiled `-fPIC`. This bug exists at or above the level of assembler; The debug info output of the assembler is wrong, and the linker leaves it untouched. — Iwillnotexist Idonotexist, Nov 03 '15 at 18:16
I retracted my workaround of placing a call to a do-nothing function at the beginning of each line that references a TLS variable. It works around the line table bug on your example program, and won't make things incorrect, but I don't know whether it will work around the bug in all cases in more complex programs. — Mark Plotnick, Nov 03 '15 at 18:37
OK, I think the root of the problem is this: for `obj.x=1`, the assembly code emitted by gcc is `.loc 1 14 0 \n .byte 0x66 \n leaq obj@tlsgd(%rip), %rdi \n .value 0x6666 \n rex64 \n call __tls_get_addr@PLT \n movl $1, (%rax)`. (Much of that instruction sequence is replaced later - by the loader? - before the executable is produced.) When gas sees the `.loc`, it will emit dwarf line table info when it sees the next instruction, i.e., when it sees `leaq obj@tlsgd(%rip), %rdi`. But gcc evidently intended that gas emit line number info as soon as it sees the `.byte 0x66` directive. — Mark Plotnick, Nov 03 '15 at 19:25
@MarkPlotnick That sequence is exactly that mandated by [_ELF Handling for Thread-Local Storage_](http://www.akkadia.org/drepper/tls.pdf), see page 22 _4.1.6 x86-64 General Dynamic TLS Model_. So the problem lies in GAS, when it relaxes General Dynamic to Local Exec or similar (Look at the same document, pages 51 and on) and fails to generate the proper DWARF debug info for it. — Iwillnotexist Idonotexist, Nov 03 '15 at 19:51
@MarkPlotnick I think you nailed it. I replaced `.byte 0x66 \n` with `data16`, assembled it, and it generated correct DWARF debug line information. — Iwillnotexist Idonotexist, Nov 03 '15 at 20:06
And if it's very urgent for you to fix this, then, blunt as this may sound, open `/usr/lib64/gcc/.../cc1` in a hex editor, search for the 11-byte binary string `.byte\t0x66\n` (there's only one in my C compiler) and replace it with the equivalent 11-byte string `data16 ` (note the 5 single spaces at the end). — Iwillnotexist Idonotexist, Nov 03 '15 at 20:18
@IwillnotexistIdonotexist I don't have access to directly write that area. What I can do is to copy `cc1` to my home area and try the fix. Is there a way to specify a different `cc1` path to `gcc` ? — Kartik Anand, Nov 04 '15 at 03:09
@KartikAnand Pass the `-B/path/to/cc1/directory` option to GCC. — Iwillnotexist Idonotexist, Nov 04 '15 at 04:24
@IwillnotexistIdonotexist It's working!. But I'm getting the following warning /tmp/ccP6tWZY.s: Assembler messages: /tmp/ccP6tWZY.s:30: Warning: stand-alone `data16' prefix — Kartik Anand, Nov 04 '15 at 05:06
@KartikAnand did you make sure to replace the final newline in that 11-byte string with a space? — Iwillnotexist Idonotexist, Nov 04 '15 at 06:00
@IwillnotexistIdonotexist Yeah i forgot that. It's working now without any warnings. So is this an already present bug in GCC or we need to file a new one? — Kartik Anand, Nov 04 '15 at 06:46
The error lies in GAS, since equivalent assembler input produces different and wrong DWARF debug info. This hack only changes GCC to provide alternate, equivalent assembler that GAS will produce correct DWARF for. I'd file a bug with GAS if I were you. — Iwillnotexist Idonotexist, Nov 04 '15 at 12:28
If your code is compiled with -fPIC, it's suitable for inclusion in a library - the library must be able to be relocated from its preferred location in memory to another address, there could be another already loaded library at the address your library prefers. Check if this is happening with -lpthread — Alan, Nov 04 '15 at 13:23
@IwillnotexistIdonotexist Can you post your comments as answer so that I can accept it. Thanks for the help! — Kartik Anand, Nov 05 '15 at 06:35

score 8 · Accepted Answer · edited May 23 '17 at 12:24

The problem lies deep in the bowels of GAS, the GNU assembler, and how it generates DWARF debug information.

The compiler, GCC, has the responsibility of generating a specific sequence of instructions for a position-independent thread-local access, which is documented in the document ELF Handling for Thread-Local Storage, page 22, section 4.1.6: x86-64 General Dynamic TLS Model. This sequence is:

0x00 .byte 0x66
0x01 leaq  x@tlsgd(%rip),%rdi
0x08 .word 0x6666
0x0a rex64
0x0b call __tls_get_addr@plt

, and is the way it is because the 16 bytes it occupies leave space for backend/assembler/linker optimizations. Indeed, your compiler generates the following assembler for threadMain():

threadMain:
.LFB2:
        .file 1 "thread.c"
        .loc 1 14 0
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        subq    $16, %rsp
        movq    %rdi, -8(%rbp)
        .loc 1 15 0
        .byte   0x66
        leaq    obj@tlsgd(%rip), %rdi
        .value  0x6666
        rex64
        call    __tls_get_addr@PLT
        movl    $1, (%rax)
        .loc 1 16 0
        ...

The assembler, GAS, then relaxes this code, which contains a function call (!), down to just two instructions. These are:

a mov having an fs:-segment override, and
a lea

, in the final assembly. They occupy between themselves 16 bytes in total, demonstrating why the General Dynamic Model instruction sequence is designed to require 16 bytes.

(gdb) disas/r threadMain                                                                                                                                                                                         
Dump of assembler code for function threadMain:                                                                                                                                                                  
   0x00000000004007f0 <+0>:     55      push   %rbp                                                                                                                                                              
   0x00000000004007f1 <+1>:     48 89 e5        mov    %rsp,%rbp                                                                                                                                                 
   0x00000000004007f4 <+4>:     48 83 ec 10     sub    $0x10,%rsp                                                                                                                                                
   0x00000000004007f8 <+8>:     48 89 7d f8     mov    %rdi,-0x8(%rbp)                                                                                                                                           
   0x00000000004007fc <+12>:    64 48 8b 04 25 00 00 00 00      mov    %fs:0x0,%rax
   0x0000000000400805 <+21>:    48 8d 80 f8 ff ff ff    lea    -0x8(%rax),%rax
   0x000000000040080c <+28>:    c7 00 01 00 00 00       movl   $0x1,(%rax)

So far, everything has been done correctly. The problem now begins as GAS generates DWARF debug information for your particular assembler code.

While parsing line-by-line in binutils-x.y.z/gas/read.c, function void read_a_source_file (char *name), GAS encounters .loc 1 15 0, the statement that begins the next line, and runs the handler void dwarf2_directive_loc (int dummy ATTRIBUTE_UNUSED) in dwarf2dbg.c. Unfortunately, the handler does not unconditionally emit debug information for the current offset within the "fragment" (frag_now) of machine code it is currently building. It could have done this by calling dwarf2_emit_insn(0), but the .loc handler currently only does so if it sees multiple .loc directives consecutively. Instead, in our case it continues on to the next line, leaving the debug information unemitted.
On the next line it sees the .byte 0x66 directive of the General Dynamic sequence. This is not, in and of itself, part of an instruction, despite representing the data16 instruction prefix in x86 assembly. GAS acts upon it with the handler cons_worker(), and the fragment increases from 12 bytes to 13 in size.
On the next line it sees a true instruction, leaq, which is parsed by calling the macro assemble_one() that maps to void md_assemble (char *line) in gas/config/tc-i386.c. At the very end of that function, output_insn() is called, which itself finally calls dwarf2_emit_insn(0) and causes debug information to be emitted at last. A new Line Number Statement (LNS) is begun that claims that line 15 began at function-start-address plus previous fragment size, but since we passed over the .byte statement before doing so, the fragment is 1 byte too large, and the computed offset for the first instruction of line 15 is therefore 1 byte off.
Some time later GAS relaxes the Global Dynamic Sequence to the final instruction sequence that starts with mov fs:0x0, %rax. The code size and all offsets remain unchanged because both sequences of instructions are 16 bytes. The debug information is unchanged, and still wrong.

GDB, when it reads the Line Number Statements, is told that the prologue of threadMain(), which is associated with the line 14 on which is found its signature, ends where line 15 begins. GDB dutifully plants a breakpoint at that location, but unfortunately it is 1 byte too far.

When run without a breakpoint, the program runs normally, and sees

64 48 8b 04 25 00 00 00 00      mov    %fs:0x0,%rax

. Correctly placing the breakpoint would involve saving and replacing the first byte of an instruction with int3 (opcode 0xcc), leaving

cc                              int3
48 8b 04 25 00 00 00 00         mov    (0x0),%rax

. The normal step-over sequence would then involve restoring the first byte of the instruction, setting the program counter eip to the address of that breakpoint, single-stepping, re-inserting the breakpoint, then continuing the program.

However, when GDB plants the breakpoint at the incorrect address 1 byte too far, the program sees instead

64 cc                           fs:int3
8b 04 25 00 00 00 00            <garbage>

which is a wierd but still valid breakpoint. That's why you didn't see SIGILL (illegal instruction).

Now, when GDB attempts to step over, it restores the instruction byte, sets the PC to the address of the breakpoint, and this is what it sees now:

64                              fs:                # CPU DOESN'T SEE THIS!
48 8b 04 25 00 00 00 00         mov    (0x0),%rax  # <- CPU EXECUTES STARTING HERE!
# BOOM! SEGFAULT!

Because GDB restarted execution one byte too far, the CPU does not decode the fs: instruction prefix byte, and instead executes mov (0x0),%rax with the default segment, which is ds: (data). This immediately results in a read from address 0, the null pointer. The SIGSEGV promptly follows.

All due credits to Mark Plotnick for essentially nailing this.

The solution that was retained is to binary-patch cc1, gcc's actual C compiler, to emit data16 instead of .byte 0x66. This results in GAS parsing the prefix and instruction combination as a single unit, yielding the correct offset in the debug information.

Do we know if this has been officially patched? If so, what version of gcc contains the fix? If not, is there a public issue tracker somewhere which we can keep an eye on? — Alex Jansen, Nov 10 '21 at 10:23
Well, at least the solution you described in the question comment thread works. In case anybody else runs into this and wants a "command" to run, the following worked for me on gcc version 4.8.5 20150623 (Red Hat 4.8.5-44): `cc1_loc="$(gcc -print-prog-name=cc1)" && cp "$cc1_loc" cc1.original && yum install -y vim && xxd -p "$cc1_loc" | sed 's/2e6279746509307836360a/6461746131362020202020/g' | xxd -r -p - > cc1.modified && chmod 755 cc1.modified && chmod +x cc1.modified && cp -f cc1.modified "$cc1_loc"` Feel free to clean that up and add it to a formal answer if you'd like. — Alex Jansen, Nov 10 '21 at 23:30
@AlexJansen Actually, it looks like there [is a fix](https://gcc.gnu.org/bugzilla//show_bug.cgi?id=86257), and it even links back to Mark Plotnick's comment. GCC SVN `r262006`, Git `fd082a66f8be44616584164672eeb8e2779c5593`. Fix landed in GCC 9.1. — Iwillnotexist Idonotexist, Nov 11 '21 at 04:26

Program compiled with -fPIC crashes while stepping over thread-local variable in GDB

1 Answers1

Linked