This is obv waaay late but I thought I'd post anyways for future readers' benefit (I imagine OP is long done with this problem).   
TL;DR: 
I think we probably want to use the Data.Vector package for this problem (and similar types of problems).
Longer version:
According to the Haskell docs, a Map (from Data.Map) has O(log N) access time whereas a Vector (from Data.Vector) has O(1) access; we can see the difference in the results below: the vector implementation runs ~3x faster.  (Both are way better than lists which have O(N) access time.)
A couple of benchmarks are included below.  The tests were intentionally not run one after another so as to prevent any cache-based optimization.  
A couple of observations:
The largest absolute improvement (from the code in the original post) was due to the addition of type signatures; without being explicitly told that the data was of type Int, Haskell's type system was inferring that the data was of type Integer (which is obv bigger and slower)
 
A bit counterintuitive but, results are virtually indistinguishable between foldl1' and foldl1.  (I double checked the code and ran these a few times just to make sure.)
 
Vector and Array (and, to a certain extent, Map) allow for decent improvement primarily as a result of memoization.  (Note that OP's solution is likely a lot faster than a list-based solution that tried to use memoization given lists' O(N) access time.)
 
Here are a couple of benchmarks (all compiled using O2):
                                                       Probably want to look
                                                         at these numbers        
                                                                |             
                                                                V         
Data.Vector                     0.35s user 0.10s system 97% cpu 0.468 total
Data.Array (Haskell.org)        0.31s user 0.21s system 98% cpu 0.530 total
Data.Map (above answer)         1.31s user 0.46s system 99% cpu 1.772 total
Control.Parallel (Haskell.org)  1.75s user 0.05s system 99% cpu 1.799 total
OP (`Int` type sigs + foldl')   3.60s user 0.06s system 99% cpu 3.674 total
OP (`Int` type sigs)            3.53s user 0.16s system 99% cpu 3.709 total
OP (verbatim)                   3.36s user 4.77s system 99% cpu 8.146 total
Source of figures from Haskell.org: https://www.haskell.org/haskellwiki/Euler_problems/11_to_20#Problem_14
The Data.Vector implementation used to generate the above results:
import Data.Vector ( Vector, fromList, maxIndex, (!) )
main :: IO ()
main = putStrLn $ show $ largestCollatz 1000000
largestCollatz :: Int -> Int
largestCollatz n = maxIndex vector
  where 
    vector :: Vector Int               
    vector = fromList $ 0 : 1 : [collatz x x 0 | x <- [2..n]]
    collatz m i c =
      case i < m of
        True  -> c + vector ! i
        False -> let j = if even i then i `div` 2 else 3*i + 1
                 in collatz m j (c+1)