Attached a minimal example:
from numba import jit
import numba as nb
import numpy as np
@jit(nb.float64[:, :](nb.int32[:, :])) 
def go_fast(a): 
    trace = 0.0
    for i in range(a.shape[0]):  
        trace += np.tanh(a[i, i]) 
    return a + trace          
@jit 
def go_fast2(a): 
    trace = 0.0
    for i in range(a.shape[0]):  
        trace += np.tanh(a[i, i]) 
    return a + trace 
Running in Jupyter:
x = np.arange(10000).reshape(100, 100)
%timeit go_fast(x)
%timeit go_fast2(x)
leads to
5.65 µs ± 27.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.8 µs ± 46.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Why the eager compilations leads to a slower execution?