I try to parse large log files in haskell. I'm using System.IO.Streams but it seems to eat a lot of memory when I fold over the input. Here are two (ugly) examples:
First load 1M Int to memory in a list. 
let l = foldl (\aux p -> p:aux) [] [1..1000000]
return (sum l)
Memory consumption is beautiful. Ints eat 3Mb and the list needs 6Mb:
see memory consumption of building list of 1M Int
Then try the same with Stream of ByteStrings. We need an ugly back and forth conversation but I don't think makes any difference
let s = Streams.fromList $ map (B.pack . show) [1..1000000]
l <-  s >>= 
      Streams.map bsToInt  >>=
      Streams.fold (\aux p -> p:aux) [] 
return (sum l)
see memory consumption of building a list of Ints from a stream
Why does it need more memory? And it's even worse if I read it from a file. It needs 90Mb
result <- withFileAsInput file load
putStrLn $ "loaded " ++ show result
where load is = do
        l <- Streams.lines is >>= 
             Streams.map bsToInt  >>=
             Streams.fold (\aux p -> p:aux) [] 
        return (sum l)
My assumption is Streams.fold has some issues. Because the library's built in countInput method doesn't use it. Any idea?
EDIT
after investigation I reduced the question to this: why does this code needs an extra 50Mb?
do
  let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
  let l2 = map (fst . fromJust . B.readInt) l
  return (foldl' (\aux p -> p:aux) [] l2)
without the conversions it only needs 30Mb, with the conversions 90Mb.
 
     
    