Tree traversal in C++ and Haskell

Question

I am new to Haskell. I am trying to get a sense how well haskell can handle the recursive function call together with their lazy evaluation. The experiement I made is simply building binary search tree in both C++ and Haskell and traverse them respectively in postorder. C++ implementation is standard one with auxilary stack. (I just print the element out once I visit it).

Here is my haskell code:

module Main (main) where

import System.Environment (getArgs)
import System.IO
import System.Exit
import Control.Monad(when)
import qualified Data.ByteString as S

main = do
     args <- getArgs
     when (length args < 1) $ do
          putStrLn "Missing input files"
          exitFailure

     content <- readFile (args !! 0)
     --preorderV print $ buildTree content
     mapM_ print $ traverse POST $ buildTree content
     putStrLn "end"


data BSTree a = EmptyTree | Node a (BSTree a) (BSTree a) deriving (Show)
data Mode = IN | POST | PRE

singleNode :: a -> BSTree a
singleNode x = Node x EmptyTree EmptyTree

bstInsert :: (Ord a) => a -> BSTree a -> BSTree a
bstInsert x EmptyTree = singleNode x
bstInsert x (Node a left right)
          | x == a = Node a left right
          | x < a  = Node a (bstInsert x left) right
          | x > a  = Node a left (bstInsert x right)

buildTree :: String -> BSTree String
buildTree = foldr bstInsert EmptyTree . words

preorder :: BSTree a -> [a]
preorder EmptyTree = []
preorder (Node x left right) = [x] ++ preorder left ++ preorder right

inorder :: BSTree a -> [a]
inorder EmptyTree = []
inorder (Node x left right) = inorder left ++ [x] ++ inorder right

postorder :: BSTree a -> [a]
postorder EmptyTree = []
postorder (Node x left right) = postorder left ++  postorder right ++[x]

traverse :: Mode -> BSTree a -> [a]
traverse x tree = case x of IN   -> inorder tree
                            POST -> postorder tree
                            PRE  -> preorder tree


preorderV :: (a->IO ()) -> BSTree a -> IO ()
preorderV f EmptyTree = return ()
preorderV f (Node x left right) = do 
                                     f x
                                     preorderV f left
                                     preorderV f right

My test result show that C++ significantly outperform Haskell:

C++ performance: (note that first15000.txt is roughly 5 times of first3000.txt)

time ./speedTestForTraversal first3000.txt > /dev/null 

real    0m0.158s
user    0m0.156s
sys     0m0.000s
time ./speedTestForTraversal first15000.txt > /dev/null 

real    0m0.923s
user    0m0.916s
sys     0m0.004s

Haskell with same input file:

time ./speedTestTreeTraversal first3000.txt > /dev/null 

real    0m0.500s
user    0m0.488s
sys     0m0.008s
time ./speedTestTreeTraversal first15000.txt > /dev/null 

real    0m3.511s
user    0m3.436s
sys     0m0.072s

What I expected that haskell should be not too far away from C++. Did I make some mistake? Is there any way to improve my haskell code?

Thanks

Edit: Oct. 18 2014

After testing serval occasions, haskell's traversal is still significantly slower than C++ implementation. I would like to give Cirdec's answer a full credit, since he points out inefficiency of my haskell implementation. However, my original question is to compare C++ and haskell implementation. So I would like to keep this question open and post my C++ code to encourage further discussion.

#include <iostream>
#include <string>
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <stack>
using namespace std;
using boost::algorithm::trim;
using boost::algorithm::split;


template<typename T>
class Node
{
public:
    Node(): val(0), l(NULL), r(NULL), p(NULL) {};
    Node(const T &v): val(v), l(NULL), r(NULL), p(NULL) {}
    Node* getLeft() {return l;}
    Node* getRight(){return r;}
    Node* getParent() {return p;}
    void  setLeft(Node *n) {l = n;}
    void  setRight(Node *n) {r = n;}
    void  setParent(Node *n) {p = n;}
    T  &getVal() {return val;}
    Node* getSucc() {return NULL;}
    Node* getPred() {return NULL;}
private:
    T val;
    Node *l;
    Node *r;
    Node *p;
};

template<typename T>
void destoryOne(Node<T>* n)
{
    delete n;
    n = NULL;
}

template<typename T>
void printOne(Node<T>* n)
{
    if (n!=NULL)
    std::cout << n->getVal() << std::endl;
}




template<typename T>
class BinarySearchTree
{
public:
    typedef void (*Visit)(Node<T> *);

    BinarySearchTree(): root(NULL) {}
    void delNode(const T &val){};
    void insertNode(const T &val){
    if (root==NULL)
        root = new Node<T>(val);
    else {
        Node<T> *ptr = root;
        Node<T> *ancester = NULL;
        while(ptr && ptr->getVal()!=val) {
        ancester = ptr;
        ptr = (val < ptr->getVal()) ? ptr->getLeft() : ptr->getRight(); 
        }
        if (ptr==NULL) {
        Node<T> *n = new Node<T>(val);
        if (val < ancester->getVal())
            ancester->setLeft(n);
        else
            ancester->setRight(n);
        } // else the node exists already so ignore!
    }
    }
    ~BinarySearchTree() {
    destoryTree(root);
    }
    void destoryTree(Node<T>* rootN) {
    iterativePostorder(&destoryOne);
    }

    void iterativePostorder(Visit fn) {
    std::stack<Node<T>* > internalStack;
    Node<T> *p = root;
    Node<T> *q = root;
    while(p) {
        while (p->getLeft()) {
        internalStack.push(p);
        p = p->getLeft();
        }
        while (p && (p->getRight()==NULL || p->getRight()==q)) {
        fn(p);
        q = p;
        if (internalStack.empty())
            return;
        else {
            p = internalStack.top();
            internalStack.pop();
        }
        }
        internalStack.push(p);
        p = p->getRight();
    }
    }


    Node<T> * getRoot(){ return root;}
private:
    Node<T> *root;
};



int main(int argc, char *argv[])
{
    BinarySearchTree<string> bst;
    if (argc<2) {
    cout << "Missing input file" << endl;
    return 0;
    }
    ifstream inputFile(argv[1]);
    if (inputFile.fail()) {
    cout << "Fail to open file " << argv[1] << endl;
    return 0;
    }
    while (!inputFile.eof()) {
    string word;
    inputFile >> word;
    trim(word);
    if (!word.empty()) {
        bst.insertNode(word);
    }
    }

    bst.iterativePostorder(&printOne);

    return 0;
}

Edit: Oct 20 2014 Chris's answer in below is very thorough and I can repeat the result.

note that in main function, I comment out a line. I was thinking that print out element directly instead of making a list first could be better. It turns out both approaches have similar perforrmance. Looks like lazy-evaluation does a good job here. — Yan Zhu, Oct 07 '14 at 05:00
Perhaps you meant to have `postorder (Node x left right) = postorder left ++ postorder right ++ [x]`? — Cirdec, Oct 07 '14 at 06:11

Cirdec · Answer 1 · 2014-10-07T06:14:58.190

List concatenation with ++ is slow, each time ++ occurs, its first argument must be traversed to the end to find where to add the second argument. You can see how the first argument is traversed all the way to [] in the definition of ++ from the standard prelude:

(++) :: [a] -> [a] -> [a]
[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

When ++ is used recursively this traversal must be repeated for each level of recursion, which is inefficient.

There's another way to build lists: If you know what will come at the end of a list before you start to build it, you can build it with the end already in place. Let's look at the definition of postorder

postorder :: BSTree a -> [a]
postorder EmptyTree = []
postorder (Node x left right) = postorder left ++ postorder right ++ [x]

When we make postorder left, we already know what will come after it, it will be postorder right ++ [x], so it would make sense to build the list for the left side of the tree with the right side and the value from the node already in place. Similarly, when we make postorder right, we already know what should come after it, namely x. We can do exactly that by making a helper function that passes an accumulated value for the rest of the list

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree rest = rest
        go (Node x left right) rest = go left (go right (x:rest))

This is about twice as fast on my machine when run with a 15k word dictionary as input. Let's explore this a bit more to see if we can gain a deeper understanding. If we rewrite our postorder definition using function composition (.) and application ($) instead of nested parenthesis we'd have

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree rest = rest
        go (Node x left right) rest = go left . go right . (x:) $ rest

We can even drop the rest argument and the function application, $, and write this in a slightly more points-free style

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree = id
        go (Node x left right) = go left . go right . (x:)

Now we can see what we've done. We've replaced a list [a], with a function [a] -> [a] that prepends the list to an existing list. The empty list is replaced with the function that doesn't add anything to the start of a list, which is the identity function, id. The singleton list [x] is replaced with the function that adds x to the beginning of a list, (x:). List concatenation a ++ b is replaced with function composition f . g - first add the things g will add to the beginning of the list, then add the things that f will add to the beginning of that list.

Just for reference, the approach proposed by @Cirdec is called "difference lists" and it's quite standard among functional and logic programming. — Riccardo T., Oct 07 '14 at 08:57
*"each time ++ occurs, its first argument must be traversed to the end "* [it's a little bit more nuanced than that](http://stackoverflow.com/a/14942678/849891), the definition and the traversals occur at different times after all. the key is that `(++)` builds a rigid *explicit* `++` structure; `go` rearranges the *implicit* ("future") structure as it goes, "rotating it" to the right, and so ends up with a more right-leaning structure (good). -- *"build the list for the left side of the tree with the right side ... already in place"* that's a good way to put it. — Will Ness, Oct 07 '14 at 10:15
Thanks Cirdec for the reply. I knew that (++) is expensive than (:). Thanks for point this trick for me. However, after I compile your code. I got the very similar performance as before. Does your test run on top my code or you have a different one? I am wondering if there is something else bottlenecks my program. — Yan Zhu, Oct 08 '14 at 02:41
@Yan Zhu I ran your code with the first 15k entries of this dictionary: http://docs.oracle.com/javase/tutorial/collections/interfaces/examples/dictionary.txt. These are in alphabetical order, so they result in worst case performance for `bstInsert`. I compiled the code using the 64 bit windows GHC 7.8.3 using the command `ghc -O2`. I timed the code by adding `getCPUTime` from `System.CPUTime` before and after the `mapM_ print $ traverse POST $ buildTree content` line. — Cirdec, Oct 08 '14 at 03:35
@Cirdec full testing would try increasing, decreasing, and random inputs (worst case for `insert` just might be the best case for `++`). Plus, any comparison of speeds at one size point is meaningless; at least two, preferably more size-points are needed to see how the [empirical orders of growth](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth) compare. For instance, the OP data shows C at N^1.1 (typical of linearithmic behavior), and Haskell at N^1.2 (suggestive of N (log N)^2), not that far behind. — Will Ness, Oct 08 '14 at 08:21

CR Drost · Accepted Answer · 2014-10-20T21:30:04.653

I generated a file containing all of the 4-letter strings on the lowercase ASCII alphabet abcdefghijklmnopqrstuvwxyz separated by spaces; and I think I got it in the right order so that the tree your code generates is perfectly balanced.

I chose this length because it takes 3.4 seconds on my computer, much like your 3.5s Haskell run. I called it 26_4.txt for obvious reasons. It sounds like actually your data set is close to 26⁴ words, so it's comparable in length too.

A lower bound on runtime would be something like:

import System.IO
main = do
    mylist <- readFile "26_4.txt"
    mapM_ putStrLn (words mylist)

and this for my data set reduces to taking 0.4s (piping stdout to /dev/null of course). So we can't expect more than, say, a factor of 10 speedup on this sort of problem from Haskell, it looks like. However, that factor is well within the bounds for your problem; C++ takes twice as long as this super-simple program.

But, no processing is an unrealistic goal. We can get a bound which is more realistic if we use a data structure which was already optimized for us by professionals who understand Haskell better:

import System.IO
import qualified Data.Map.Strict as Map

balancedTree = Map.fromList . map (\k -> (k, ()))

serializeTree = map fst . Map.toList

main = do
    mylist <- readFile "26_4.txt"
    mapM_ putStrLn (serializeTree $ balancedTree $ words mylist)

This runs for something more like 1.6s on my machine. It's not quite as fast as your C++ but your C++ does not balance the tree as far as I can tell.

I made Cirdec's modification to the code and it took your code down to 3.1s, so it only shaved about 10% of that runtime off of the file.

However, on my machine, this file doesn't even run unless you give it more memory with RTSopts. And that points to a really important optimization: tail-call optimization. The code from both you and Cirdec is suboptimal in a special way: it is not tail-recursive, meaning that it cannot be turned into a loop by GHC. We can make it tail-recursive by writing an explicit stack of 'stuff to be done' which we descend into:

postorder :: BSTree a -> [a]
postorder t = go [] [t]
    where go xs [] = xs
          go xs (EmptyTree : ts) = go xs ts
          go xs (Node x a b : ts) = go (x : xs) (b : a : ts)

This change seems to bring it all the way down to 2.1s.

Another difference between C++ and Haskell which causes some amount of time wasted is that the Haskell version will allow you to lazily construct your search tree, whereas your C++ code wouldn't allow this. We can make the Haskell code strict to deal with this, providing something like:

data BSTree a = EmptyTree
          | Node  !a !(BSTree a) !(BSTree a) deriving (Show)

This change combined with Cirdec's brings us down to 1.1 seconds, which means that we're on-par with your C++ code, at least on my machine. You should test this on your machine to see if those are also the main problems. I think any further optimizations can't be done "from the armchair" and must instead be done with a proper profiler.

Remember to ghc -O2 the code, else the tail-calls and other optimizations may not take.

IMO, your implementation of postorder can be viewed as a tail-recursive conversion of iterative algorithm in C++. I believe similar trick can be used for inorder/preorder, although I haven't tried it. Am I right? — Yan Zhu, Oct 21 '14 at 01:59
Yep, tail-recursives are just iterative algorithms where all the state variables for the loop are promoted into arguments! The code for `preorder` is just swapping the `b : a : ts` for `a : b : ts`, I believe; I think `inorder` will require a bit more thinking. We can generalize the type of the internal stack to `[(BSTree x, [x])]` where the second element of the pair holds a reversed list of elements to append after this tree is processed, using `app (x:xs) ys = app xs (x : ys)`. Then e.g. `go out ((Node x left right, xs) : ts) = go out ((right, x:xs) : (left, []) : ts)` has `inorder` logic. — CR Drost, Oct 21 '14 at 19:53

Tree traversal in C++ and Haskell

2 Answers2