Although I have a good LSFR C implementation I thought I'd try the same in Haskell - just to see how it goes. What I came up with, so far, is two orders of magnitude slower than the C implementation, which begs the question: How can the performance be improved? Clearly, the bit-fiddling operations are the bottleneck, and the profiler confirms this.
Here's the baseline Haskell code using lists and Data.Bits:
import           Control.Monad      (when)
import           Data.Bits          (Bits, shift, testBit, xor, (.&.), (.|.))
import           System.Environment (getArgs)
import           System.Exit        (exitFailure, exitSuccess)
tap :: [[Int]]
tap = [
    [],            [],            [],            [3, 2],
    [4, 3],        [5, 3],        [6, 5],        [7, 6],
    [8, 6, 5, 4],  [9, 5],        [10, 7],       [11, 9],
    [12, 6, 4, 1], [13, 4, 3, 1], [14, 5, 3, 1], [15, 14],
    [16,15,13,4],  [17, 14],      [18, 11],      [19, 6, 2, 1],
    [20, 17],      [21, 19],      [22, 21],      [23, 18],
    [24,23,22,17], [25, 22],      [26, 6, 2, 1], [27, 5, 2, 1],
    [28, 25],      [29, 27],      [30, 6, 4, 1], [31, 28],
    [32,22,2,1],   [33,20],       [34,27,2,1],   [35,33],
    [36,25],       [37,5,4,3,2,1],[38,6,5,1],    [39,35],
    [40,38,21,19], [41,38],       [42,41,20,19], [43,42,38,37],
    [44,43,18,17], [45,44,42,41], [46,45,26,25], [47,42],
    [48,47,21,20], [49,40],       [50,49,24,23], [51,50,36,35],
    [52,49],       [53,52,38,37], [54,53,18,17], [55,31],
    [56,55,35,34], [57,50],       [58,39],       [59,58,38,37],
    [60,59],       [61,60,46,45], [62,61,6,5],   [63,62]        ]
xor' :: [Bool] -> Bool
xor' = foldr xor False
mask ::  (Num a, Bits a) => Int -> a
mask len = shift 1 len - 1
advance :: Int -> [Int] -> Int -> Int
advance len tap lfsr
    | d0        = shifted
    | otherwise = shifted .|. 1
    where
        shifted = shift lfsr 1 .&. mask len
        d0 = xor' $ map (testBit lfsr) tap'
        tap' = map (subtract 1) tap
main :: IO ()
main = do
    args <- getArgs
    when (null args) $ fail "Usage: lsfr <number-of-bits>"
    let len = read $ head args
    when (len < 8) $ fail "No need for LFSR"
    let out = last $ take (shift 1 len) $ iterate (advance len (tap!!len)) 0
    if out == 0 then do
        putStr "OK\n"
        exitSuccess
    else do
        putStr "FAIL\n"
        exitFailure
Basically it tests whether the LSFR defined in tap :: [[Int]] for any given bit-length is of maximum-length. (More precisely, it just checks whether the LSFR reaches the initial state (zero) after 2n iterations.)
According to the profiler the most costly line is the feedback bit d0 = xor' $ map (testBit lfsr) tap'.
What I've tried so far:
- use Data.Array: Attempt abandoned because there's no foldl/r
- use Data.Vector: Slightly faster than the baseline
The compiler options I use are: -O2, LTS Haskell 8.12 (GHC-8.0.2).
The reference C++ program can be found on gist.github.com.
The Haskell code can't be expected (?) to run as fast as the C code, but two orders of magnitude is too much, there must be a better way to do the bit-fiddling.
Update: Results of applying the optimisations suggested in the answers
- The reference C++ program with input 28, compiled with LLVM 8.0.0, runs in 0.67s on my machine (the same with clang 3.7 is marginally slower, 0.68s)
- The baseline Haskell code runs about 100x slower (because of the space inefficiency don't try it with inputs larger than 25)
- With the rewrite of @Thomas M. DuBuisson, still using the default GHC backend, the execution time goes down to 5.2s
- With the rewrite of @Thomas M. DuBuisson, now using the LLVM backend (GHC option -O2 -fllvm), the execution time goes down to 1.7s- Using GHC option -O2 -fllvm -optlc -mcpu=nativebrings this to 0.73s
 
- Using GHC option 
- Replacing iteratewithiterate'of @cirdec makes no difference when Thomas' code is used (both with the default 'native' backend and LLVM). However, it does make a difference when the baseline code is used.
So, we've come from 100x to 8x to 1.09x, i.e. only 9% slower than C!
Note 
The LLVM backend to GHC 8.0.2 requires LLVM 3.7. On Mac OS X this means installing this version with brew and then symlinking opt and llc. See 7.10. GHC Backends.
 
     
     
     
     
    