Although late, here's a version that's supposed to not leak space as much (and seems to run about twice faster than the other 3-way version here):
qsort3 xs = go xs [] 
  where
    go     (x:xs) zs       = part x xs zs [] [] []
    go     []     zs       = zs
    part x []     zs a b c = go a ((x : b) ++ go c zs)
    part x (y:ys) zs a b c =
        case compare y x of
                  LT -> part x ys zs (y:a) b c
                  EQ -> part x ys zs a (y:b) c
                  GT -> part x ys zs a b (y:c)
This addresses the possible problem with using tuples, where let (a,b) = ... is actually translated into let t= ...; a=fst t; b=snd t which leads to the situation where even after a has been consumed and processed, it is still kept around alive, as part of the tuple t, for b to be read from it - though of course completely unnecessary. This is known as "Wadler pair space leak" problem. Or maybe GHC (with -O2) is smarter than that. :) 
Also this apparently uses difference lists approach (thanks, hammar) which also makes it a bit more efficient (about twice faster than the version using tuples). I think part uses accumulator parameters, as it builds them in reversed order.