Edit to Answer Questions
did I do this right?
Almost, as the comments said, you build a big thunk of 1+(1+(1+...)) - use a strict accumulator instead or a higher-level function that handles things for you.  There are other minor things like defining a function to compare the second element instead of using maximumBy (comparing snd) but that's more stylistic.
As is, is the the proper Haskell way to do it?
It is acceptably idiomatic Haskell code.
How could I make it faster?
See my below benchmarks.  The extremely common answers for Euler performance questions are:
- Use -O2 (as you did)
 
- Try -fllvm (GHC NCG is sub-optimal)
 
- Use worker/wrappers to reduce arguments or, in your case, leverage an accumulator.
 
- Use fast/unboxable types (Int when you can instead Integer, Int64 if needed for portability, etc).
 
- Use 
rem instead of mod when all the values are positive.  For your case it is also useful to know or discover that div tends to compile to something that is slower than quot. 
Also, with memory usage, how is the recursion actually implemented on the low-level?
  As in how does it use memory?
Both these questions are very broad.  Complete answers would likey need to address lazy evaluation, tail call optimization, worker transformations, garbage collection, etc.  I suggest you explore these answers in more depth over time (or hope someone here makes the complete answer I'm avoiding).
Original Post - Benchmark numbers
Original:
$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)
real    0m5.971s
user    0m5.940s
sys 0m0.019s
Use a worker function with an accumulator for collatzLength:
$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)
real    0m5.617s
user    0m5.590s
sys 0m0.012s
Use Int and not defaulting to Integer - it's also easier to read with type signatures!
$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)
real    0m2.937s
user    0m2.932s
sys 0m0.001s
Use rem and not mod:
$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)
real    0m2.436s
user    0m2.431s
sys 0m0.001s
Use quotRem and not rem then div:
$ ghc -O2 so.hs ; time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
(837799,525)
real    0m1.672s
user    0m1.669s
sys 0m0.002s
This is all very much like a previous question: Speed comparison with Project Euler: C vs Python vs Erlang vs Haskell
EDIT: and yes, as Daniel Fischer suggests, using bit ops of .&. and shiftR improves on quotRem:
$ ghc -O2 so.hs ; time ./so
(837799,525)
real    0m0.314s
user    0m0.312s
sys 0m0.001s
Or you can just use LLVM and let it do it's magic (NB this version uses quotRem still)
$ time ./so
(837799,525)
real    0m0.286s
user    0m0.283s
sys 0m0.002s
LLVM actually does well, so long as you avoid the hideousness that is mod, and optimizes the guard-based code using either rem or even equally well as the hand-optimized .&. with shiftR.
For a result that is around 20x faster than the original.
EDIT: People are surprised that quotRem performs as well as bit manipulation in the face of Int.   The code is included, but I'm not clear on the surprise: just because something is possibly negative doesn't mean you can't handle it with very similar bit manipulations that could be of identical cost on the right hardware.  All three versions of nextStep seem to be performing identically (ghc -O2 -fforce-recomp -fllvm, ghc version 7.6.3, LLVM 3.3, x86-64).
{-# LANGUAGE BangPatterns, UnboxedTuples #-}
import Data.Bits
collatzLength :: Int -> Int
collatzLength x| x == 1    = 1
               | otherwise = go x 0
 where
    go 1 a  = a + 1
    go x !a = go (nextStep x) (a+1)
longestChain :: (Int, Int) -> Int -> Int -> (Int,Int)
longestChain (num, numLength) bound !counter
   | counter >= bound = (num, numLength)
   | otherwise = longestChain (longerOf (num,numLength) (counter, collatzLength counter)) bound (counter + 1)
           --I know this is a messy function, but I was doing this problem just 
           --for myself, so I didn't bother making some utility functions for it.
           --also, I split the big line in half to display on here nicer, would
           --it actually run with this line split?
longerOf :: (Int,Int) -> (Int,Int) -> (Int,Int)
longerOf (a1,a2) (b1,b2)| a2 > b2 = (a1,a2)
                        | otherwise = (b1,b2)
{-# INLINE longerOf #-}
nextStep :: Int -> Int
-- Version 'bits'
nextStep n = if 0 == n .&. 1 then n `shiftR` 1 else 3*n+1
-- Version 'quotRem'
-- nextStep n = let (q,r) = quotRem n 2 in if r == 0 then q else 3*n+1
-- Version 'almost the original'
-- nextStep n | even n = quot n 2
--            | otherwise  = 3*n + 1
{-# INLINE nextStep #-}
main = print (longestChain (0,0) 1000000 1)