--- title: Functional Algorithm tags: Haskell algorithm author: seckcoder slide: false --- I have started this post since reading "Pearls of Functional Algorithm Design" by Richard Bird. It contains my random thoughts about programming puzzles(some from that book). Most of the algorithms are implemented in Haskell. Most of the code in Haskell can run in [fpcomplete's online IDE][fpide]. # Binary Search, the easy way ## Basic case Description: > Given a list of sorted values, find the position of some given value. > If there are multiple values in the list, return any one of them The basic case of binary search is very easy.  haskell import Data.Vector import Data.Maybe binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int binSearch f (p,r) | p > r = Nothing | otherwise = case f m of EQ -> Just m LT -> binSearch f (m+1,r) GT -> binSearch f (p,m-1) where m = (p+r) div 2 -- special case for vector binSearchInVec :: (Ord a) => Vector a -> a -> (Int,Int) -> Maybe Int binSearchInVec vec val = binSearch (\ m -> vec!m compare val) main = print $fromJust$ binSearchInVec (fromList [1,2,3,4]) 2 (0,3)  ## Extended case1: find the smallest/largest index The base case is easy since the terminating condition is clear. But for some extended cases, it's harder to get it right. The most important questions one should ask when implementing binary search is - When to return the result - When to go to upper/lower range - How to change the range If we can't find a good solution for above questions, we will get trapped in "infinite loops", "out of bound error", etc. Some claims that after changing the range from [p,r] to (p-1,r+1) or [p,r+1) or (p-1, r] will make it easier. However, I think a much better method is: 1. Add a variable mark to help decide when to return. 2. Change the range aggressively. For every loop, we should decrease the range. Description: > Given a list of sorted values, find the position of some given value. > If there are multiple values in the list, return the **smallest** index. haskell import Data.Vector import Data.Maybe binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int binSearch f (p,r) = go (p,r) Nothing where go (p,r) mark | p > r = mark | otherwise = case f m of EQ -> go (p,m-1) (Just m) LT -> go (m+1,r) mark GT -> go (p,m-1) mark -- special case for vector binSearchInVec :: (Ord a) => Vector a -> a -> (Int,Int) -> Maybe Int binSearchInVec vec val = binSearch (\ m -> vec!m compare val) main = print $fromJust$ binSearchInVec (fromList [1,2,2,3,4]) 2 (0,4)  mark stores the most recent EQ value found. We can prove the above program will surely get the smallest index based on the following inductive rule: > Suppose initial search range is [first, last], current search range is [p,r], then, mark = Nothing || Just min{x | first<=x

r, mark = Nothing || Just min {first<=x<=last, f x == EQ}. Exercise: How to find the largest index? ## Extended case 2: find turning point. Description: > At a given point: x, we break a strictly sorted list with length >= 3 into two parts: a and b, then append a to the end of b and get another list c. Now, the question is, given c, find the index of the smallest element in c(which is the head of original sorted list Again, we will implement it with the help of mark variable. hs import Data.Vector import Data.Maybe binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int binSearch f (p,r) = go (p,r) Nothing where go (p,r) mark | p > r = mark | otherwise = case f m of EQ -> Nothing LT -> go (p, m-1) (Just m) GT -> go (m+1, r) mark where m = (p+r) div 2 -- special case for vector binSearchInVec :: (Ord a) => Vector a -> (Int,Int) -> Maybe Int binSearchInVec vec (p,r) = binSearch (\ m -> (vec!m) compare (vec!p)) (p,r) main = case binSearchInVec (fromList [2,3,1]) (0,2) of Nothing -> print "the list is not strictly increasing" Just i -> print i  - When to return the result? * When p>r, we have considered all elements, so we should return the mark. * When f m == EQ, we know that either m == first or there exists $i \in [first, last]$ and $f i = f last$, both of which contradict with our question assumption (list should be strictly sorted and have a size >= 3). So we return Nothing. - When to go to upper/lower range? easy question - How to change the range? As I have always advocated, change the range aggressively to avoid infinite recursion. ## Extended case 3 Poj has an interesting [problem][poj1064]. My haskell implementation is as follows: My solution: hs -- lower bound binary search. binSearch :: (Int -> Bool) -> (Int,Int) -> Maybe Int binSearch f (p,r) = go (p,r) Nothing where go (p,r) mark | p > r = mark | otherwise = if f m then go (m+1,r) (Just m) else go (p,m-1) mark where m = (p+r) div 2 -- Note I didn't follow the format requested by the problem. cabalLength :: [Double] -> Int -> Double cabalLength cables num_pieces = case binSearch validLength (1, 10000000) of Nothing -> fromIntegral 0 Just v -> (fromIntegral v) / 100 where cabal_ints = map (floor . (*100)) cables validLength len = (foldr (+) 0 (map (div len) cabal_ints)) >= num_pieces main = print $cabalLength [8.02,7.43,4.57,5.39] 11  # Tree traversal Tree traversal is good example for recursive program design since it's intuitive. But the non-recursive version is not easy to understand, especially if it's implemented in some imperative programming languages like C. In this chapter, I will give the tail-recursive implementation(not in continuation passing style) of tree traversal in Haskell, which is easy to reimplement in C. First, I will provide the recursive implementation, which will be used by quick check to test the tail-recursive version.  haskell data BinTree a = EmptyTree | Branch a (BinTree a) (BinTree a) deriving (Show) preOrder :: BinTree a -> [a] preOrder (EmptyTree) = [] preOrder (Branch a left right) = [a] ++ preOrder left ++ preOrder right inOrder :: BinTree a -> [a] inOrder (EmptyTree) = [] inOrder (Branch a left right) = inOrder left ++ [a] ++ inOrder right postOrder :: BinTree a -> [a] postOrder (EmptyTree) = [] postOrder (Branch a left right) = postOrder left ++ postOrder right ++ [a] data BinTree a = EmptyTree | Branch a (BinTree a) (BinTree a) deriving (Show) preOrder :: BinTree a -> [a] preOrder (EmptyTree) = [] preOrder (Branch a left right) = [a] ++ preOrder left ++ preOrder right inOrder :: BinTree a -> [a] inOrder (EmptyTree) = [] inOrder (Branch a left right) = inOrder left ++ [a] ++ inOrder right postOrder :: BinTree a -> [a] postOrder (EmptyTree) = [] postOrder (Branch a left right) = postOrder left ++ postOrder right ++ [a] sampleTree :: BinTree Int sampleTree = Branch 4 (Branch 2 (Branch 1 EmptyTree EmptyTree) (Branch 3 EmptyTree EmptyTree)) (Branch 6 (Branch 5 EmptyTree EmptyTree) (Branch 7 EmptyTree EmptyTree)) main = putStr ( "preorder: " ++ (show$ preOrder sampleTree) ++ "\n" ++ "inorder: " ++ (show $inOrder sampleTree) ++ "\n" ++ "postorder: " ++ (show$ postOrder sampleTree) ++ "\n")  Tail-recursive version: hs data BinTree a = EmptyTree | Branch a (BinTree a) (BinTree a) deriving (Show) preOrderT :: BinTree a -> [a] preOrderT bt = go [bt] [] where go [] xs = reverse xs go (EmptyTree:ts) xs = go ts xs go (Branch v left right:ts) xs = go (left:right:ts) (v:xs) inOrderT :: BinTree a -> [a] inOrderT bt = go [bt] [] [] where go [] [] xs = reverse xs go (EmptyTree:ts) [] xs = go ts [] xs go (EmptyTree:ts) (v:left_acc) xs = go ts left_acc (v:xs) go (Branch v left right:ts) left_acc xs = go (left:right:ts) (v:left_acc) xs -- tail recursive post order traversal postOrderT :: BinTree a -> [a] postOrderT bt = go [bt] [] where go [] xs = xs go (EmptyTree:ts) xs = go ts xs go (Branch v left right:ts) xs = go (right:left:ts) (v:xs) sampleTree :: BinTree Int sampleTree = Branch 4 (Branch 2 (Branch 1 EmptyTree EmptyTree) (Branch 3 EmptyTree EmptyTree)) (Branch 6 (Branch 5 EmptyTree EmptyTree) (Branch 7 EmptyTree EmptyTree)) main = putStr ( "preorder: " ++ (show $preOrderT sampleTree) ++ "\n" ++ "inorder: " ++ (show$ inOrderT sampleTree) ++ "\n" ++ "postorder: " ++ (show $postOrderT sampleTree) ++ "\n")  The key point of tree traversal lies in visit order. For pre-order traversal, we should visit in the order of v -> left -> right. For in-order, it's left -> v -> right. For post-order, it should be left -> right -> v. The recursive version clearly shows the relationship. That's also why it's intuitive. For the tail-recursive version, can we make it that clear? We know that stack can be used to implement function call. So we just need to use stack to store the context for visit order. Here, we use list to represent stack. For preOrderT's local function go trees xs: * trees: list of trees to visit. Suppose trees = (t:ts), i.e, t is in the left of ts. We maintain the order that t should always be visited before ts. * xs: values of nodes that has already been visited. Suppose xs = (v:vs). We maintain that v was visited before vs, i.e., when the program terminated, we should return xs in reverse order. Also, as xs keeps values of nodes visited, we know nodes of values in xs are visited before nodes in trees. Therfore: - go [] xs : we have no trees to visit, we return xs in reverse order. - go (EmptyTree:ts) xs : currently on the top of the stack is an empty tree. We have nothing to record for it, so we continue with ts. - go (Branch v left right:ts) xs : according to our definition, the visit order for a branch is v -> left -> right. Therefore, we put v in the visited nodes xs and continue the process with left:right:ts, which means left is visited before right, which is also in front of ts. Everything is Clear! postOrderT looks similar to preOrderT. Note that, the reverse visit order of post-order traversal is v->right->left. So we only need to adapt the pre-order algorithm so that right is visited before left, then the reverse of the result of pre-order should be the result of post-order. Clear! For inOrderT, go trees xs_acc xs: - trees and xs has the same meaning as preOrderT - xs_acc: keeps values of nodes that we **will** visit. xs_acc is constructed during in-order traversal, it records the visiting context of node for the recursive definition. Suppose xs_acc = (x:acc). Then x should be visited before acc. Also, for trees = (t:ts), at the entry of function go, we should maintain t -> v -> (ts join acc). Here, join means keep the order t->v->... for ts and acc. To simplify, we need to make sure that t is visited before v while v should be visited before all other nodes. Therefore: - go [] [] xs : nothing to visit, return xs in reverse order - go (EmptyTree:ts) [] xs : this case happens when ts = []. As we have nothing to record for EmptyTree, we go with ts. - go (EmptyTree:ts) (v:left_acc) xs : remember that we need to maintain order t->v->rest. Therefore, we first visit EmptyTree(nothing to do), then visit v by adding it to xs. - go (Branch v left right:ts) left_acc xs: according to definition of in-order traversal, we need to make sure left->v->right. Therefore, we transfer it to go (left:right:ts) (v:left_acc) xs, which perfectly maintains our order t -> v -> rest. Clear! The quick check property to help verify our implementation: hs -- Note the following code can't be run independently. randomTree :: [a] -> Gen (BinTree a) randomTree [] = return EmptyTree randomTree (x:xs) = do k <- choose (0, (length xs)-1) let (ls,rs) = splitAt k xs left <- randomTree ls right <- randomTree rs return$ Branch x left right instance Arbitrary a => Arbitrary (BinTree a) where arbitrary = do n <- choose (0, 10000) lst <- vector n randomTree lst prop_postOrder bt = postOrder bt == postOrderT bt where types = [bt :: BinTree Int] prop_preOrder bt = preOrder bt == preOrderT bt where types = [bt :: BinTree Int] prop_inOrder bt = inOrder bt == inOrderT bt where types = [bt :: BinTree Int]  Based on the tail-recursive implementation, it's easy to implement an iterative version in C/C++: cpp #include #include #include using namespace std; class BinTree { public: BinTree() { v = 0; left = right = NULL; } BinTree(int hv) { v = hv; left = right = NULL; } int v; BinTree *left; BinTree *right; }; void preOrder(BinTree *rt, deque &xs) { vector stack; stack.push_back(rt); while (!stack.empty()) { BinTree *cur = stack.back(); stack.pop_back(); if (cur != NULL) { xs.push_back(cur -> v); stack.push_back(cur->right); stack.push_back(cur->left); } } } void inOrder(BinTree *rt, deque &xs) { vector stack; stack.push_back(rt); vector acc_xs; while (!stack.empty()) { BinTree *cur = stack.back(); stack.pop_back(); if (cur == NULL && acc_xs.empty()) { // Nothing to do } else if (cur == NULL) { int x = acc_xs.back(); acc_xs.pop_back(); xs.push_back(x); } else { acc_xs.push_back(cur->v); stack.push_back(cur->right); stack.push_back(cur->left); } } } void postOrder(BinTree *rt, deque &xs) { vector stack; stack.push_back(rt); while (!stack.empty()) { BinTree *cur = stack.back(); stack.pop_back(); if (cur != NULL) { xs.push_front(cur->v); stack.push_back(cur->left); stack.push_back(cur->right); } } } int main() { BinTree n1(1); BinTree n2(2); BinTree n3(3); n2.left = &n1; n2.right = &n3; cout << "pre order:" << endl; { deque res; preOrder(&n2, res); for (int i = 0; i < res.size(); i++) { cout << res[i] << endl; } } cout << "in order: " << endl; { deque res; inOrder(&n2, res); for (int i = 0; i < res.size(); i++) { cout << res[i] << endl; } } cout << "post order: " << endl; { deque res; postOrder(&n2, res); for (int i = 0; i < res.size(); i++) { cout << res[i] << endl; } } return 0; }  Apparently, Haskell version looks much clearer than C version. ## Extende case1 : Binary Tree Iterator > Description: Construct a binary search tree iterator such that getting an element from the iterator should run in average O(1) time and O(h) memory. It's trivial to implement this in Haskell since Haskell has native support for lazy evaluation. Actually, the above algorithm already fulfills the requirement. What about implementing this in C/C++? Actually, what we want is in order tree traversal. cpp #include #include #include #include using namespace std; struct TreeNode { int val; TreeNode *left; TreeNode *right; TreeNode (int x) : val(x), left(NULL), right(NULL) {} }; class BSTIterator { public: TreeNode *rt; vector trees; deque xs; vector xs_acc; BSTIterator(TreeNode *root) { rt = root; trees.push_back(root); } /** @return whether we have a next smallest number */ bool hasNext() { if (trees.size() == 1 && trees == NULL && xs.empty() && xs_acc.empty()) return false; return true; } /** @return the next smallest number */ int next() { //preOrder(); inOrder(); int v = xs.front(); xs.pop_front(); return v; } void inOrder() { while (!trees.empty() && xs.empty()) { TreeNode *cur = trees.back(); trees.pop_back(); if (cur == NULL && xs_acc.empty()) { // pass } else if (cur == NULL) { xs.push_back(xs_acc.back()); xs_acc.pop_back(); } else { xs_acc.push_back(cur->val); trees.push_back(cur->right); trees.push_back(cur->left); } } } }; int main() { TreeNode n1(1); TreeNode n2(2); TreeNode n3(3); n2.left = &n1; n2.right = &n3; BSTIterator i = BSTIterator(&n2); while (i.hasNext()) { cout << i.next() << endl; } return 0; }  Here, I changed the in-order tree traversal according to the requirement. xs can be used to keep the visited elements. A corner case for our algorithm is the terminating condition(when hasNext return false). We know this happens when go (EmptyTree:ts) [] xs` is evaluated. # Backtracking TODO... # Finger Tree TODO... # Lexer and Parser TODO... [poj1064]: http://poj.org/problem?id=1064 [fpide]: https://www.fpcomplete.com/business/developer-tools/