Expanding on PolyThinker's answer, here's a concrete example.
int foo(int a, int b) {
    if (a && b)
        return foo(a - 1, b - 1);
    return a + b;
}
i686-pc-linux-gnu-gcc-4.3.2 -Os -fno-optimize-sibling-calls output:
00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 16                   je     23 <foo+0x23>
   d:   85 c0                   test   %eax,%eax
   f:   74 12                   je     23 <foo+0x23>
  11:   51                      push   %ecx
  12:   48                      dec    %eax
  13:   51                      push   %ecx
  14:   50                      push   %eax
  15:   8d 42 ff                lea    -0x1(%edx),%eax
  18:   50                      push   %eax
  19:   e8 fc ff ff ff          call   1a <foo+0x1a>
  1e:   83 c4 10                add    $0x10,%esp
  21:   eb 02                   jmp    25 <foo+0x25>
  23:   01 d0                   add    %edx,%eax
  25:   c9                      leave
  26:   c3                      ret
i686-pc-linux-gnu-gcc-4.3.2 -Os output:
00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 08                   je     15 <foo+0x15>
   d:   85 c0                   test   %eax,%eax
   f:   74 04                   je     15 <foo+0x15>
  11:   48                      dec    %eax
  12:   4a                      dec    %edx
  13:   eb f4                   jmp    9 <foo+0x9>
  15:   5d                      pop    %ebp
  16:   01 d0                   add    %edx,%eax
  18:   c3                      ret
In the first case, <foo+0x11>-<foo+0x1d> pushes arguments for a function call, while in the second case, <foo+0x11>-<foo+0x14> modifies the variables and jmps to the same function, somewhere after the preamble.  That's what you want to look for.
I don't think you can do this programatically; there's too much possible variation.  The "meat" of the function may be closer to or further away from the start, and you can't distinguish that jmp from a loop or conditional without looking at it.  It might be a conditional jump instead of a jmp.  gcc might leave a call in for some cases but apply sibling call optimization to other cases.
FYI, gcc's "sibling calls" is slightly more general than tail-recursive calls -- effectively, any function call where re-using the same stack frame is okay is potentially a sibling call.
[edit]
As an example of when just looking for a self-recursive call will mislead you,
int bar(int n) {
    if (n == 0)
        return bar(bar(1));
    if (n % 2)
        return n;
    return bar(n / 2);
}
GCC will apply sibling call optimization to two out of the three bar calls.  I'd still call it tail-call-optimized, since that single unoptimized call never goes further than a single level, even though you'll find a call <bar+..> in the generated assembly.