I have started this post since reading "Pearls of Functional Algorithm Design" by Richard Bird. It contains my random thoughts about programming puzzles(some from that book). Most of the algorithms are implemented in Haskell. Most of the code in Haskell can run in fpcomplete's online IDE.

Binary Search, the easy way

Basic case

Description:

Given a list of sorted values, find the position of some given value.
If there are multiple values in the list, return any one of them

The basic case of binary search is very easy.



import Data.Vector
import Data.Maybe

binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int
binSearch f (p,r) | p > r = Nothing
                  | otherwise = case f m of
                                    EQ -> Just m
                                    LT -> binSearch f (m+1,r)
                                    GT -> binSearch f (p,m-1)
                                  where m = (p+r) `div` 2

-- special case for vector
binSearchInVec :: (Ord a) => Vector a -> a -> (Int,Int) -> Maybe Int
binSearchInVec vec val = binSearch (\ m -> vec!m `compare` val)



main = print $ fromJust $ binSearchInVec (fromList [1,2,3,4]) 2 (0,3)

Extended case1: find the smallest/largest index

The base case is easy since the terminating condition is clear. But for some extended cases, it's harder to get it right.
The most important questions one should ask when implementing binary search is

When to return the result
When to go to upper/lower range
How to change the range

If we can't find a good solution for above questions, we will get trapped in "infinite loops", "out of bound error", etc.

Some claims that after changing the range from [p,r] to (p-1,r+1) or [p,r+1) or (p-1, r] will make it easier. However, I think a much better
method is:

Add a variable mark to help decide when to return.
Change the range aggressively. For every loop, we should decrease the range.

Description:

Given a list of sorted values, find the position of some given value.
If there are multiple values in the list, return the smallest index.

import Data.Vector
import Data.Maybe

binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int
binSearch f (p,r) = go (p,r) Nothing
    where go (p,r) mark | p > r = mark
                        | otherwise = case f m of
                                        EQ -> go (p,m-1) (Just m)
                                        LT -> go (m+1,r) mark
                                        GT -> go (p,m-1) mark

-- special case for vector
binSearchInVec :: (Ord a) => Vector a -> a -> (Int,Int) -> Maybe Int
binSearchInVec vec val = binSearch (\ m -> vec!m `compare` val)



main = print $ fromJust $ binSearchInVec (fromList [1,2,2,3,4]) 2 (0,4)

mark stores the most recent EQ value found. We can prove the above program will surely get the smallest index based on the following
inductive rule:

Suppose initial search range is [first, last], current search range is [p,r], then, mark = Nothing || Just min{x | first<=x<p || r < x <= last, f x == EQ }

Proof:

Initially, $ p = first , r = last $, mark = Nothing.
For $ [p,r] $, suppose for $ x \in [first,p) \cup (r, last] $ there exists $ f x = EQ $, then mark = Just x, otherwise mark = Nothing.
When p > r, mark = Nothing || Just min {first<=x<=last, f x == EQ}.

Exercise: How to find the largest index?

Extended case 2: find turning point.

Description:

At a given point: x, we break a strictly sorted list with length >= 3 into two parts: a and b, then append a to the end of b and get another list c. Now, the question is, given c, find the index of the smallest element in c(which is the head of original sorted list

Again, we will implement it with the help of mark variable.


import Data.Vector
import Data.Maybe

binSearch :: (Int -> Ordering) -> (Int, Int) -> Maybe Int
binSearch f (p,r) = go (p,r) Nothing
    where go (p,r) mark | p > r = mark
                        | otherwise = case f m of
                                        EQ -> Nothing
                                        LT -> go (p, m-1) (Just m)
                                        GT -> go (m+1, r) mark
                                       where m = (p+r) `div` 2

-- special case for vector
binSearchInVec :: (Ord a) => Vector a -> (Int,Int) -> Maybe Int
binSearchInVec vec (p,r) = binSearch (\ m -> (vec!m) `compare` (vec!p)) (p,r)



main = case binSearchInVec (fromList [2,3,1]) (0,2) of
            Nothing -> print "the list is not strictly increasing"
            Just i -> print i

When to return the result?
- When p>r, we have considered all elements, so we should return the mark.
- When f m == EQ, we know that either m == first or there exists $ i \in [first, last] $ and $ f i = f last $, both of which contradict with our question assumption (list should be strictly sorted and have a size >= 3). So we return Nothing.
When to go to upper/lower range?

easy question

How to change the range?

As I have always advocated, change the range aggressively to avoid infinite recursion.

Extended case 3

Poj has an interesting problem. My haskell implementation is as follows:

My solution:


-- lower bound binary search.
binSearch :: (Int -> Bool) -> (Int,Int) -> Maybe Int
binSearch f (p,r) = go (p,r) Nothing
  where go (p,r) mark | p > r = mark
                      | otherwise = if f m then go (m+1,r) (Just m) else go (p,m-1) mark
                                      where m = (p+r) `div` 2

-- Note I didn't follow the format requested by the problem. 
cabalLength :: [Double] -> Int -> Double
cabalLength cables num_pieces = case binSearch validLength (1, 10000000) of
                                     Nothing -> fromIntegral 0
                                     Just v -> (fromIntegral v) / 100
    where cabal_ints = map (floor . (*100)) cables
          validLength len = (foldr (+) 0 (map (`div` len) cabal_ints)) >= num_pieces

main = print $ cabalLength [8.02,7.43,4.57,5.39] 11

Tree traversal

Tree traversal is good example for recursive program design since it's intuitive. But the non-recursive version is not easy to understand, especially if it's implemented in some imperative programming languages like C. In this chapter, I will give the tail-recursive implementation(not in continuation passing style) of tree traversal in Haskell, which is easy to reimplement in C.

First, I will provide the recursive implementation, which will be used by quick check to test the tail-recursive version.

data BinTree a = EmptyTree
               | Branch a (BinTree a) (BinTree a) deriving (Show)

preOrder :: BinTree a -> [a]
preOrder (EmptyTree) = []
preOrder (Branch a left right) = [a] ++ preOrder left ++ preOrder right

inOrder :: BinTree a -> [a]
inOrder (EmptyTree) = []
inOrder (Branch a left right) = inOrder left ++ [a] ++ inOrder right

postOrder :: BinTree a -> [a]
postOrder (EmptyTree) = []
postOrder (Branch a left right) = postOrder left ++ postOrder right ++ [a]

data BinTree a = EmptyTree
               | Branch a (BinTree a) (BinTree a) deriving (Show)

preOrder :: BinTree a -> [a]
preOrder (EmptyTree) = []
preOrder (Branch a left right) = [a] ++ preOrder left ++ preOrder right

inOrder :: BinTree a -> [a]
inOrder (EmptyTree) = []
inOrder (Branch a left right) = inOrder left ++ [a] ++ inOrder right

postOrder :: BinTree a -> [a]
postOrder (EmptyTree) = []
postOrder (Branch a left right) = postOrder left ++ postOrder right ++ [a]

sampleTree :: BinTree Int
sampleTree = Branch 4 (Branch 2 (Branch 1 EmptyTree EmptyTree) (Branch 3 EmptyTree EmptyTree)) (Branch 6 (Branch 5 EmptyTree EmptyTree) (Branch 7 EmptyTree EmptyTree))

main = putStr (
         "preorder: " ++ (show $ preOrder sampleTree) ++ "\n" ++
         "inorder: " ++ (show $ inOrder sampleTree) ++ "\n" ++
         "postorder: " ++ (show $ postOrder sampleTree) ++ "\n")

Tail-recursive version:

data BinTree a = EmptyTree
               | Branch a (BinTree a) (BinTree a) deriving (Show)


preOrderT :: BinTree a -> [a]
preOrderT bt = go [bt] []
  where go [] xs = reverse xs
        go (EmptyTree:ts) xs = go ts xs
        go (Branch v left right:ts) xs = go (left:right:ts) (v:xs)

inOrderT :: BinTree a -> [a]
inOrderT bt = go [bt] [] []
  where go [] [] xs = reverse xs
        go (EmptyTree:ts) [] xs = go ts [] xs
        go (EmptyTree:ts) (v:left_acc) xs = go ts left_acc (v:xs)
        go (Branch v left right:ts) left_acc xs = go (left:right:ts) (v:left_acc) xs

-- tail recursive post order traversal
postOrderT :: BinTree a -> [a]
postOrderT bt = go [bt] []
  where go [] xs = xs
        go (EmptyTree:ts) xs = go ts xs
        go (Branch v left right:ts) xs = go (right:left:ts) (v:xs)

sampleTree :: BinTree Int
sampleTree = Branch 4 (Branch 2 (Branch 1 EmptyTree EmptyTree) (Branch 3 EmptyTree EmptyTree)) (Branch 6 (Branch 5 EmptyTree EmptyTree) (Branch 7 EmptyTree EmptyTree))

main = putStr (
         "preorder: " ++ (show $ preOrderT sampleTree) ++ "\n" ++
         "inorder: " ++ (show $ inOrderT sampleTree) ++ "\n" ++
         "postorder: " ++ (show $ postOrderT sampleTree) ++ "\n")

The key point of tree traversal lies in visit order. For pre-order traversal, we should visit in the order of v -> left -> right. For in-order, it's left -> v -> right. For post-order, it should be left -> right -> v. The recursive version clearly shows the relationship. That's also why it's intuitive.

For the tail-recursive version, can we make it that clear? We know that stack can be used to implement function call. So we just need to use stack to store the context for visit order. Here, we use list to represent stack.

For preOrderT's local function go trees xs:
* trees: list of trees to visit. Suppose trees = (t:ts), i.e, t is in the left of ts. We maintain the order that t should always be visited before ts.
* xs: values of nodes that has already been visited. Suppose xs = (v:vs). We maintain that v was visited before vs, i.e., when the program terminated, we should return xs in reverse order. Also, as xs keeps values of nodes visited, we know nodes of values in xs are visited before nodes in trees.

Therfore:

go [] xs : we have no trees to visit, we return xs in reverse order.
go (EmptyTree:ts) xs : currently on the top of the stack is an empty tree. We have nothing to record for it, so we continue with ts.
go (Branch v left right:ts) xs : according to our definition, the visit order for a branch is v -> left -> right. Therefore, we put v in the visited nodes xs and continue the process with left:right:ts, which means left is visited before right, which is also in front ofts`. Everything is Clear!

postOrderT looks similar to preOrderT. Note that, the reverse visit order of post-order traversal is v->right->left. So we only need to adapt the pre-order algorithm so that right is visited before left, then the reverse of the result of pre-order should be the result of post-order. Clear!

For inOrderT, go trees xs_acc xs:

trees and xs has the same meaning as preOrderT
xs_acc: keeps values of nodes that we will visit. xs_acc is constructed during in-order traversal, it records the visiting context of node for the recursive definition. Suppose xs_acc = (x:acc). Then x should be visited before acc. Also, for trees = (t:ts), at the entry of function go, we should maintain t -> v -> (ts join acc). Here, join means keep the order t->v->... for ts and acc. To simplify, we need to make sure that t is visited before v while v should be visited before all other nodes.

Therefore:

go [] [] xs : nothing to visit, return xs in reverse order
go (EmptyTree:ts) [] xs : this case happens when ts = []. As we have nothing to record for EmptyTree, we go with ts.
go (EmptyTree:ts) (v:left_acc) xs : remember that we need to maintain order t->v->rest. Therefore, we first visit EmptyTree(nothing to do), then visit v by adding it to xs.
go (Branch v left right:ts) left_acc xs: according to definition of in-order traversal, we need to make sure left->v->right. Therefore, we transfer it to go (left:right:ts) (v:left_acc) xs, which perfectly maintains our order t -> v -> rest. Clear!

The quick check property to help verify our implementation:

-- Note the following code can't be run independently. 
randomTree :: [a] -> Gen (BinTree a)
randomTree [] = return EmptyTree
randomTree (x:xs) = do
  k <- choose (0, (length xs)-1)
  let (ls,rs) = splitAt k xs
  left <- randomTree ls
  right <- randomTree rs
  return $ Branch x left right

instance Arbitrary a => Arbitrary (BinTree a) where
  arbitrary = do
    n <- choose (0, 10000)
    lst <- vector n
    randomTree lst

prop_postOrder bt = postOrder bt == postOrderT bt
  where types = [bt :: BinTree Int]

prop_preOrder bt = preOrder bt == preOrderT bt
  where types = [bt :: BinTree Int]

prop_inOrder bt = inOrder bt == inOrderT bt
  where types = [bt :: BinTree Int]

Based on the tail-recursive implementation, it's easy to implement an iterative version in C/C++:

#include <iostream>
#include <vector>
#include <deque>

using namespace std;

class BinTree {
  public:
    BinTree() { v = 0; left = right = NULL; }
    BinTree(int hv) { v = hv; left = right = NULL; }
    int v;
    BinTree *left;
    BinTree *right;
};


void preOrder(BinTree *rt, deque<int> &xs) {
  vector<BinTree *> stack;
  stack.push_back(rt);
  while (!stack.empty()) {
    BinTree *cur = stack.back();
    stack.pop_back();
    if (cur != NULL) {
      xs.push_back(cur -> v);
      stack.push_back(cur->right);
      stack.push_back(cur->left);
    }
  }
}

void inOrder(BinTree *rt, deque<int> &xs) {
  vector<BinTree *> stack;
  stack.push_back(rt);
  vector<int> acc_xs;

  while (!stack.empty()) {
    BinTree *cur = stack.back();
    stack.pop_back();
    if (cur == NULL && acc_xs.empty()) {
      // Nothing to do
    } else if (cur == NULL) {
      int x = acc_xs.back();
      acc_xs.pop_back();
      xs.push_back(x);
    } else {
      acc_xs.push_back(cur->v);
      stack.push_back(cur->right);
      stack.push_back(cur->left);
    }
  }
}


void postOrder(BinTree *rt, deque<int> &xs) {
  vector<BinTree *> stack;
  stack.push_back(rt);

  while (!stack.empty()) {
    BinTree *cur = stack.back();
    stack.pop_back();
    if (cur != NULL) {
      xs.push_front(cur->v);
      stack.push_back(cur->left);
      stack.push_back(cur->right);
    }
  }
}


int main() {
  BinTree n1(1);
  BinTree n2(2);
  BinTree n3(3);
  n2.left = &n1;
  n2.right = &n3;

  cout << "pre order:" << endl;
  {
    deque<int> res;
    preOrder(&n2, res);
    for (int i = 0; i < res.size(); i++) {
      cout << res[i] << endl;
    }
  }

  cout << "in order: " << endl;
  {
    deque<int> res;
    inOrder(&n2, res);
    for (int i = 0; i < res.size(); i++) {
      cout << res[i] << endl;
    }
  }

  cout << "post order: " << endl;
  {
    deque<int> res;
    postOrder(&n2, res);
    for (int i = 0; i < res.size(); i++) {
      cout << res[i] << endl;
    }
  }
  return 0;
}

Apparently, Haskell version looks much clearer than C version.

Extende case1 : Binary Tree Iterator

Description: Construct a binary search tree iterator such that getting an element from the iterator should run in average O(1) time and O(h) memory.

It's trivial to implement this in Haskell since Haskell has native support for lazy evaluation. Actually, the above algorithm already fulfills the requirement.

What about implementing this in C/C++?

Actually, what we want is in order tree traversal.


#include <cstdio>
#include <vector>
#include <deque>
#include <iostream>
using namespace std;

struct TreeNode {
  int val;
  TreeNode *left;
  TreeNode *right;
  TreeNode (int x) : val(x), left(NULL), right(NULL) {}
};

class BSTIterator {
  public:
    TreeNode *rt;
    vector<TreeNode*> trees;
    deque<int> xs;
    vector<int> xs_acc;
    BSTIterator(TreeNode *root) {
      rt = root;
      trees.push_back(root);
    }

    /** @return whether we have a next smallest number */
    bool hasNext() {
      if (trees.size() == 1 && trees[0] == NULL && xs.empty() && xs_acc.empty()) return false;
      return true;
    }

    /** @return the next smallest number */
    int next() {
      //preOrder();
      inOrder();
      int v = xs.front();
      xs.pop_front();
      return v;
    }
    void inOrder() {
      while (!trees.empty() && xs.empty()) {
        TreeNode *cur = trees.back();
        trees.pop_back();
        if (cur == NULL && xs_acc.empty()) {
          // pass
        } else if (cur == NULL) {
          xs.push_back(xs_acc.back());
          xs_acc.pop_back();
        } else {
          xs_acc.push_back(cur->val);
          trees.push_back(cur->right);
          trees.push_back(cur->left);
        }
      }
    }
};


int main() {
  TreeNode n1(1);
  TreeNode n2(2);
  TreeNode n3(3);
  n2.left = &n1;
  n2.right = &n3;
  BSTIterator i = BSTIterator(&n2);
  while (i.hasNext()) {
    cout << i.next() << endl;
  }
  return 0;
}

Here, I changed the in-order tree traversal according to the requirement. xs can be used to keep the visited elements. A corner case for our algorithm is the terminating condition(when hasNext return false). We know this happens when go (EmptyTree:ts) [] xs is evaluated.

Backtracking

TODO...

Finger Tree

TODO...

Lexer and Parser

TODO...