Why is it so slow?
Let's start with a smaller number. When you ask for Sum[_4, _4], the compiler is going to go looking for an instance, and it'll find these two methods:
implicit def sum1[B <: Nat]: Aux[_0, B, B] = new Sum[_0, B] { type Out = B }
implicit def sum2[A <: Nat, B <: Nat](implicit
sum: Sum[A, Succ[B]]
): Aux[Succ[A], B, sum.Out] = new Sum[Succ[A], B] { type Out = sum.Out }
The first one is clearly out since _4 is not _0. It knows that _4 is the same as Succ[_3] (more on that in a second), so it'll try sum2 with A as _3 and B as _4.
This means we need to find a Sum[_3, _5] instance. sum1 is out for similar reasons as before, so we try sum2 again, this time with A = _2 and B = _5, which means we need a Sum[_2, _6], which gets us back to sum2, with A = _1 and B = _6, which sends us looking for a Sum[_1, _7]. This is the last time we'll use sum2, with A = _0 and B = _7. This time when we go looking for a Sum[_0, _8] we'll hit sum1 and we're done.
So it's clear that for n + n we're going to have to do n + 1 implicit searches, and during each one the compiler is going to be doing type equality checks and other stuff (update: see Miles's answer for an explanation of what the biggest problem here is) that requires traversing the structure of the Nat types, so we're in exponential land. The compiler really, really isn't designed to work efficiently with types like this, which means that even for small numbers, this operation is going to take a long time.
Side note 1: implementation in Shapeless
Off the top of my head I'm not entirely sure why sum2 isn't defined like this:
implicit def sum2[A <: Nat, B <: Nat](implicit
sum: Sum[A, B]
): Aux[Succ[A], B, Succ[sum.Out]] = new Sum[Succ[A], B] { type Out = Succ[sum.Out] }
This is much faster, at least on my machine, where Sum[_18, _18] compiles in four seconds as opposed to seven minutes and counting.
Side note 2: induction heuristics
This doesn't seem to be a case where Typelevel Scala's -Yinduction-heuristics helps—I just tried compiling Shapeless with the @inductive annotation on Sum and it's still seems pretty much exactly as horribly slow as without it.
What about 44?
The _1, _2, _3 type aliases are defined in code produced by this boilerplate generator in Shapeless, which is configured only to produce values up to 22. In this case specifically, this is an entirely arbitrary limit. We can write the following, for example:
type _23 = Succ[_22]
And we've done exactly the same thing the code generator does, but going one step further.
It doesn't really matter much that Shapeless's _N aliases stop at 22, though, since they're just aliases. The important thing about a Nat is its structure, and that's independent of any nice names we might have for it. Even if Shapeless didn't provide any _N aliases at all, we could still write code like this:
import shapeless.Succ, shapeless.nat._0, shapeless.ops.nat.Sum
Sum[Succ[Succ[_0]], Succ[Succ[_0]]]
And it would be exactly the same as writing Sum[_2, _2], except that it's a lot more annoying to type.
So when you write Sum[_22, _22] the compiler isn't going to have any trouble representing the result type (i.e. 44 Succs around a _0), even though it doesn't have a _44 type alias.