The only code generation difference between -fPIC and -fPIE for the code shown is in the call from sub_func to subsub_func.  With -fPIC, that call goes through the PLT; with -fPIE, it's a direct call.  In assembly dumps (cc -S), that looks like this:
--- sub.s.pic   2017-12-07 08:10:00.308149431 -0500
+++ sub.s.pie   2017-12-07 08:10:08.408068650 -0500
@@ -34,7 +34,7 @@ sub_func:
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
-   call    subsub_func@PLT
+   call    subsub_func
    leaq    __func__.2258(%rip), %rsi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
In unlinked object files, it's a change of relocation type:
--- sub.o.dump.pic  2017-12-07 08:13:54.197775840 -0500
+++ sub.o.dump.pie  2017-12-07 08:13:54.197775840 -0500
@@ -22,7 +22,7 @@
   1f:  55                      push   %rbp
   20:  48 89 e5                mov    %rsp,%rbp
   23:  e8 00 00 00 00          callq  28 <sub_func+0x9>
-           24: R_X86_64_PLT32  subsub_func-0x4
+           24: R_X86_64_PC32   subsub_func-0x4
   28:  48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 2f <sub_func+0x10>
            2b: R_X86_64_PC32   .rodata+0x14
   2f:  48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 36 <sub_func+0x17>
And, on this architecture, when you link a shared library using cc -shared, the linker does not allow the input object files to contain R_X86_64_PC32 relocations targeting global symbols, thus the error you observed when you used -fPIE instead of -fPIC.
Now, you are probably wondering why direct calls within a shared library are not allowed.  In fact, they are allowed, but only when the callee is not a global.  For instance, if you declared subsub_func with static, then the call target would be resolved by the assembler and there would be no relocation at all in the object file, and if you declared it with __attribute__((visibility("hidden"))) then you would get an R_X86_64_PC32 relocation but the linker would allow it because the callee is no longer exported from the library.  But in both cases subsub_func would not be callable from outside the library anymore.
Now you're probably wondering what it is about global symbols that means you have to call them through the PLT from a shared library.  This has to do with an aspect of the ELF symbol resolution rules that you may find surprising: any global symbol in a shared library can be overridden by either the executable, or an earlier library in the link order.  Concretely, if we leave your sub.h and sub.c alone but make main.c read like this:
//main.c
#include "sub.h"
#include <stdio.h>
void subsub_func(void) {
    printf("%s (main)\n", __func__);
}
int main(void){
    sub_func();
    subsub_func();
    return 0;
}
so it's now got the an official executable entry point in it but also a second definition of subsub_func, and we compile sub.c into a shared library and main.c into an executable that calls it, and run the whole thing, like this
$ cc -fPIC -c sub.c -o sub.o
$ cc -c main.c -o main.o
$ cc -shared -Wl,-soname,libsub.so.1 sub.o -o libsub.so.1
$ ln -s libsub.so.1 libsub.so
$ cc main.o -o main -L. -lsub
$ LD_LIBRARY_PATH=. ./main
the output will be 
subsub_func (main)
sub_func
subsub_func (main)
That is, both the call from main to subsub_func, and the call from sub_func, within the library, to subsub_func, were resolved to the definition in the executable.  For that to be possible, the call from sub_func must go through the PLT.
You can change this behavior with an additional linker switch, -Bsymbolic.
$ cc -shared -Wl,-soname,libsub.so.1 -Wl,-Bsymbolic sub.o -o libsub.so.1
$ LD_LIBRARY_PATH=. ./main
subsub_func
sub_func
subsub_func (main)
Now the call from sub_func is resolved to the definition within the library.  In this case, using -Bsymbolic allows sub.c to be compiled with -fPIE instead of -fPIC, but I don't recommend you do that.  There are other effects of using -fPIE instead of -fPIC, such as changing how access to thread-local storage needs to be done, and those cannot be worked around with -Bsymbolic.