Analysis. B-tree vs. Power-of-n Array Expansion ----------------------------------------------- How does linear insertion to a B-tree compare to power-of-n expansion of an array? Specifically, how much space overhead is incurred, how many allocations, and how much memory copying must be performed? How much space overhead does a B-tree incur? Assuming a branching factor b, a B-tree requires approximately 1/(b-1) interior node for each data node[1]. If there are n data nodes, then n + (n/(b-1)) = (n(b-1) / (b-1)) + (n/(b-1)) = (n(b-1) + n) / (b-1) = nb / (b-1) nodes will exist in the B-tree. In other words, there is a node overhead of b / (b-1) Assume each entry is a machine word. What's the real overhead? In my implementation, each node requires b+2 words. Therefore, if there are n nodes in the tree, we'll require n * (b+2) * (b / (b-1)) = nb(b+2) / (b-1) machine words for the tree. Since each data node holds b entries, we'll have n*b data entries. In other words, (nb(b+2) / (b-1)) / (nb) = nb(b+2) / nb(b-1) = (b+2) / (b-1) words of tree storage per entry. If b is 16, that gives us an upper bound of (16+2)/(16-1) = 18/15 = 1.2 words of storage required for each entry [2]. Now let's say we want to stick to our guns and hold a similar upper bound with power-of-n array expansion. That means we'd need to compare to a b=16 B-tree as I've implemented it to a 1.2 array growth factor. The array will require log(1.2)n allocations and copies to store n elements. The B-tree will require n/15 allocations, and no copies. The number of operations performed for various n is shown below: Array B-tree n=1000 37 67 n=10000 50 667 n=100000 63 6667 [TODO! Finish] [1] With a branching factor of b, to support b^m leaf nodes, you need b^(m-1) + b^(m-2) + ... + b^0 interior nodes. It's easy to illustrate (but I can't prove because I can't remember any of my calculus, dammit) that: lim ( b^(m-1) + b^(m-2) + ... + b^0 ) / b^m m->inf is: 1 / b-1 Try, for example, b=10 and m=4 1000 + 100 + 10 + 1 / 10000 = 1111 / 10000 ~= 1 / 9 [2] This is an upper bound, and assumes that the B-tree is full. I believe that the lower bound in this case is very close to the upper bound when 1) n is large, and 2) the tree is filled sequentially from left to right. At worst, you'll waste some portion of the rightmost nodes at each level; or (b * log(b)n) words. Warrants further fleshing out.