Unit VI: Trees
Data Structures and Algorithms — Complete Chapter Notes (Java)
Duration: 7 Hours
Table of Contents
- Lesson Plan (7 Hours)
- 6.1 Introduction
- 6.2 Tree Terminology
- 6.3 Binary Tree
- Strictly Binary Tree
- Complete Binary Tree
- Extended Binary Tree
- 6.4 Binary Tree Representation
- Array Representation
- Linked List Representation
- 6.5 Create a Binary Tree
- 6.6 Traversal of a Binary Tree
- Preorder, Inorder, Postorder (+ Level order)
- 6.7 Binary Search Tree (BST)
- Insertion, Search, Deletion
- 6.8 Tree Height, Level and Depth
- 6.9 Balanced Tree: AVL Balanced Trees
- 6.10 The Huffman Algorithm
- 6.11 B-Tree
- Quick Revision & Exam Pointers
6.1 Introduction
So far we have studied linear data structures — arrays, stacks, queues, linked lists — where elements are arranged one after another in a single sequence. In a linear structure each element has exactly one predecessor and one successor (except the ends).
A tree is a non-linear, hierarchical data structure. Instead of a straight line, data branches out like an upside-down real tree: the root is at the top and leaves at the bottom.
(root)
/ | \
o o o <- branches
/ \ |
o o o <- leaves
Why do we need trees?
- Hierarchy is natural. File systems (folders within folders), organization charts, family trees, HTML/XML documents, and the Java class hierarchy are all trees.
- Fast searching. A balanced Binary Search Tree searches
nitems inO(log n)time, far better thanO(n)of a linked list. - Efficient sorting & priority handling. Heaps (a kind of tree) power priority queues and Heap Sort.
- Compression. Huffman coding (Section 6.10) uses a tree to compress data.
- Databases & file systems. B-Trees and B+ Trees (Section 6.11) index huge datasets on disk.
Formal definition
A tree is a finite set T of one or more nodes such that:
- There is one specially designated node called the root.
- The remaining nodes are partitioned into
n ≥ 0disjoint setsT₁, T₂, …, Tₙ, where eachTᵢis itself a tree. These are the subtrees of the root.
This definition is recursive — a tree is made of smaller trees. That is why almost every tree algorithm is naturally recursive.
6.2 Tree Terminology
Consider this tree. We will refer to it throughout the section:
A <- Level 0 (root)
/ | \
B C D <- Level 1
/ \ \
E F G <- Level 2
/
H <- Level 3
| Term | Definition | Example (above) |
|---|---|---|
| Node | A basic unit holding data + links to children | A, B, C … H |
| Root | The topmost node, has no parent | A |
| Edge | A link/line connecting a parent to a child | A–B, F–H |
| Parent | A node having one or more children | B is parent of E, F |
| Child | A node directly below another | E, F are children of B |
| Siblings | Nodes that share the same parent | B, C, D are siblings |
| Leaf (terminal/external) | A node with no children | E, H, C, G |
| Internal (non-terminal) node | A node with at least one child | A, B, D, F |
| Degree of a node | Number of children of that node | degree(A)=3, degree(B)=2, degree(E)=0 |
| Degree of a tree | Maximum degree among all its nodes | 3 (because of A) |
| Path | A sequence of nodes connected by edges | A → B → F → H |
| Ancestor | Any node on the path from root to that node | A, B, F are ancestors of H |
| Descendant | Any node in the subtree below a node | E, F, H are descendants of B |
| Level | Distance (in edges) from the root; root is level 0 | H is at level 3 |
| Depth of a node | Number of edges from the root down to the node | depth(H)=3 |
| Height of a node | Number of edges on the longest path from the node down to a leaf | height(B)=2 (B→F→H) |
| Height of the tree | Height of the root | 3 |
| Subtree | A node together with all its descendants | B,E,F,H form a subtree |
| Forest | A set of zero or more disjoint trees | remove A → {B-subtree, C, D-subtree} |
Depth vs Height (common exam trap):
- Depth is measured downward from the root to a node.
- Height is measured upward from the deepest leaf to a node.
- For the whole tree, height = depth of the deepest node.
6.3 Binary Tree
A Binary Tree is a tree in which every node has at most two children, referred to as the left child and the right child.
So the degree of any node is 0, 1, or 2.
A
/ \
B C
/ \ \
D E F
Here A has a left child B and right child C, and so on.
Properties of a binary tree
For a binary tree:
- Maximum number of nodes at level
L=2^L(level starts at 0). - Maximum number of nodes in a tree of height
h=2^(h+1) − 1. - Minimum height of a binary tree with
nnodes =⌊log₂ n⌋. - A binary tree with
nnodes has exactlyn − 1edges. - If a binary tree has
n₀leaf nodes andn₂nodes of degree 2, thenn₀ = n₂ + 1.
Java node definition (used everywhere below):
// A single node of a binary tree
class Node {
int data; // the value stored
Node left; // reference to left child (null if none)
Node right; // reference to right child (null if none)
Node(int data) {
this.data = data;
this.left = null;
this.right = null;
}
}
▪ Strictly Binary Tree (Full / Proper Binary Tree)
A binary tree is strictly binary if every node has either 0 or 2 children — no node has exactly one child.
A
/ \
B C
/ \
D E
- A has 2 children, B has 2 children, C, D, E have 0 children. ✔ Strictly binary.
A
/ \
B C
/
D
- B has only one child (D). ✘ NOT strictly binary.
Property: A strictly binary tree with n leaf nodes always has 2n − 1 total nodes.
▪ Complete Binary Tree
A binary tree is complete if all levels are completely filled except possibly the last level, and the last level has all its nodes as far left as possible.
A Level 0: full
/ \
B C Level 1: full
/ \ /
D E F Level 2: filled left-to-right, no gaps
This is complete: the last level fills left first, with no holes.
A
/ \
B C
/ \ \
D E F <- gap before F (C has no left child)
This is NOT complete — there is a missing left child of C while a right child exists.
Why it matters: Complete binary trees can be stored compactly in an array with no wasted slots (this is exactly how a binary heap works). For a node at index i:
- left child =
2i + 1 - right child =
2i + 2 - parent =
(i − 1) / 2
Perfect Binary Tree (bonus term): all internal nodes have 2 children and all leaves are at the same level. A perfect tree of height
hhas exactly2^(h+1) − 1nodes. Every perfect tree is also complete and strictly binary.
▪ Extended Binary Tree (2-Tree)
When we convert an ordinary binary tree into a strictly binary tree by adding special "external" nodes (usually drawn as squares ▢) wherever a child is missing, we get an extended binary tree.
- Original nodes become internal nodes (circles ○).
- Added dummy nodes become external nodes (squares ▢) — they have no children.
Before extension:
A
/ \
B C
\
D
After extension (▢ = added external node):
A
/ \
B C
/ \ / \
▢ D ▢ ▢
/ \
▢ ▢
Properties of extended binary tree:
- Every internal node has exactly 2 children → it is strictly binary.
- If there are
ninternal nodes, there are exactlyn + 1external nodes. - Used to compute path length: Internal Path Length (I) and External Path Length (E) are related by
E = I + 2nwheren= number of internal nodes. This is central to analyzing Huffman trees and search costs.
6.4 Binary Tree Representation
A binary tree can be stored in memory in two ways: array (sequential) representation and linked list representation.
▪ Array Representation of a Binary Tree
We number the nodes level by level, left to right, starting the root at index 1 (using index 0 is also possible — formulas shift slightly). Then:
- For a node at index
i:- Left child is at index
2i - Right child is at index
2i + 1 - Parent is at index
i / 2(integer division)
- Left child is at index
Example tree:
A(1)
/ \
B(2) C(3)
/ \ \
D(4) E(5) G(7)
Array (1-indexed):
Index : 0 1 2 3 4 5 6 7
Value : [ - A B C D E - G ]
- Index 6 is empty because C has no left child — the slot is still reserved.
Java — array-based binary tree:
public class ArrayBinaryTree {
private char[] tree; // index 0 is unused
private int maxSize;
public ArrayBinaryTree(int height) {
// a tree of height h needs 2^(h+1) - 1 nodes; +1 because we skip index 0
maxSize = (int) Math.pow(2, height + 1);
tree = new char[maxSize];
}
public void setRoot(char value) { tree[1] = value; }
public void setLeft(char value, int parentIndex) { tree[2 * parentIndex] = value; }
public void setRight(char value, int parentIndex) { tree[2 * parentIndex + 1] = value; }
public char getLeft(int i) { return tree[2 * i]; }
public char getRight(int i) { return tree[2 * i + 1]; }
public char getParent(int i) { return tree[i / 2]; }
public void printInorder(int i) {
if (i >= maxSize || tree[i] == '\u0000') return; // empty slot
printInorder(2 * i); // left
System.out.print(tree[i] + " "); // node
printInorder(2 * i + 1); // right
}
public static void main(String[] args) {
ArrayBinaryTree t = new ArrayBinaryTree(2);
t.setRoot('A');
t.setLeft('B', 1); t.setRight('C', 1);
t.setLeft('D', 2); t.setRight('E', 2);
System.out.print("Inorder: ");
t.printInorder(1); // Output: D B E A C
}
}
Advantages of array representation
- No pointers needed; simple index math.
- Direct random access to parent/child via formulas.
- Perfect for complete trees and heaps (no wasted space).
Disadvantages
- Huge memory waste for skewed / sparse trees (a right-skewed tree of
nnodes needs an array of size2ⁿ!). - Fixed size — hard to grow; insertion/deletion of arbitrary nodes is costly.
▪ Linked List Representation of a Binary Tree
Each node is an object holding data plus two references (left and right). A missing child is simply null. The tree is reached through a single root reference.
┌──────┬──────┬──────┐
root ->│ left │ data │ right │
└──┬───┴──────┴───┬───┘
v v
(left child) (right child)
For the tree
A
/ \
B C
/
D
the memory picture is:
root → [•|A|•]
/ \
[•|B|•] [∅|C|∅]
/ \
[∅|D|∅] ∅
(• = a reference, ∅ = null)
Java — linked representation:
class Node {
char data;
Node left, right;
Node(char data) { this.data = data; }
}
public class LinkedBinaryTree {
Node root;
public static void main(String[] args) {
LinkedBinaryTree t = new LinkedBinaryTree();
t.root = new Node('A');
t.root.left = new Node('B');
t.root.right = new Node('C');
t.root.left.left = new Node('D');
// C and D's children stay null automatically
}
}
Advantages of linked representation
- No wasted memory for missing children (just
null). - Easy to grow; insertion/deletion only rewires a few references.
- Natural fit for recursion.
Disadvantages
- Extra memory for two reference fields per node.
- No direct access to the parent (unless we also store a parent pointer).
- To reach a node you must traverse from the root.
| Feature | Array Representation | Linked Representation |
|---|---|---|
| Memory for sparse trees | Wasteful | Efficient |
| Access parent/child | O(1) via index math | Need pointers / traversal |
| Insertion/Deletion | Costly (shifting) | Cheap (rewire pointers) |
| Best for | Complete trees, heaps | General / dynamic trees |
| Size | Fixed | Dynamic |
6.5 Create a Binary Tree
"Creating" a binary tree means allocating nodes and wiring up the left/right links. We show two common ways.
Method 1 — Manual construction
class Node {
int data;
Node left, right;
Node(int data) { this.data = data; }
}
public class CreateTree {
public static void main(String[] args) {
/* Build this tree:
1
/ \
2 3
/ \
4 5
*/
Node root = new Node(1);
root.left = new Node(2);
root.right = new Node(3);
root.left.left = new Node(4);
root.left.right= new Node(5);
System.out.println("Tree created. Root = " + root.data);
}
}
Method 2 — Build by inserting into a queue (level-order build)
This is how you build a tree from a sequence such as 1 2 3 4 5 6 so that it becomes a complete binary tree (each new node fills the next left-to-right slot).
import java.util.*;
class Node {
int data; Node left, right;
Node(int d){ data = d; }
}
public class BuildLevelOrder {
static Node insertLevelOrder(int[] arr) {
if (arr.length == 0) return null;
Node root = new Node(arr[0]);
Queue<Node> q = new LinkedList<>();
q.add(root);
int i = 1;
while (i < arr.length) {
Node cur = q.poll();
if (i < arr.length) { // attach left child
cur.left = new Node(arr[i++]);
q.add(cur.left);
}
if (i < arr.length) { // attach right child
cur.right = new Node(arr[i++]);
q.add(cur.right);
}
}
return root;
}
public static void main(String[] args) {
int[] arr = {1, 2, 3, 4, 5, 6};
Node root = insertLevelOrder(arr);
// Produces:
// 1
// / \
// 2 3
// / \ /
// 4 5 6
System.out.println("Built. Root = " + root.data);
}
}
Algorithm (level-order build):
1. If array empty, return null.
2. Create root from arr[0]; enqueue root; set i = 1.
3. While i < length:
cur = dequeue()
if i < length: cur.left = node(arr[i]); enqueue(cur.left); i++
if i < length: cur.right = node(arr[i]); enqueue(cur.right); i++
4. Return root.
6.6 Traversal of a Binary Tree
Traversal = visiting every node of the tree exactly once in some order. For binary trees there are three classic depth-first orders plus one breadth-first order.
The three depth-first orders differ only in WHEN we process the current node (Root) relative to its left (L) and right (R) subtrees:
| Traversal | Order | Mnemonic |
|---|---|---|
| Preorder | Root → Left → Right | process root first |
| Inorder | Left → Root → Right | root in the middle |
| Postorder | Left → Right → Root | root last |
We use this sample tree for all traces:
A
/ \
B C
/ \ \
D E F
/
G
▪ Preorder Traversal (Root, Left, Right)
Algorithm:
preorder(node):
if node == null: return
visit(node) // 1. process root
preorder(node.left) // 2. go left
preorder(node.right) // 3. go right
Java:
void preorder(Node n) {
if (n == null) return;
System.out.print(n.data + " "); // Root
preorder(n.left); // Left
preorder(n.right); // Right
}
Dry run / tracing (call stack shown):
preorder(A) -> print A | go left
preorder(B) -> print B | go left
preorder(D) -> print D | left null | right null (return)
preorder(E) -> print E | left null | right null (return)
(back to B done) go right of A
preorder(C) -> print C | left null | go right
preorder(F) -> print F | go left
preorder(G) -> print G | (return)
Output: A B D E C F G
Use: Preorder produces a copy of the tree / prefix expression of an expression tree; it visits a node before its descendants, useful for making a copy or serializing a tree.
▪ Inorder Traversal (Left, Root, Right)
Algorithm:
inorder(node):
if node == null: return
inorder(node.left) // 1. left subtree
visit(node) // 2. process root
inorder(node.right) // 3. right subtree
Java:
void inorder(Node n) {
if (n == null) return;
inorder(n.left); // Left
System.out.print(n.data + " "); // Root
inorder(n.right); // Right
}
Tracing on the sample tree:
inorder(A)
inorder(B)
inorder(D) -> left null, print D, right null
print B
inorder(E) -> left null, print E, right null
print A
inorder(C)
left null
print C
inorder(F)
inorder(G) -> print G
print F
Output: D B E A C G F
Use (very important): Inorder traversal of a Binary Search Tree visits nodes in sorted ascending order. This is the single most exam-relevant fact about traversals.
▪ Postorder Traversal (Left, Right, Root)
Algorithm:
postorder(node):
if node == null: return
postorder(node.left) // 1. left subtree
postorder(node.right) // 2. right subtree
visit(node) // 3. process root last
Java:
void postorder(Node n) {
if (n == null) return;
postorder(n.left); // Left
postorder(n.right); // Right
System.out.print(n.data + " "); // Root
}
Tracing on the sample tree:
postorder(A)
postorder(B)
postorder(D) -> print D
postorder(E) -> print E
print B
postorder(C)
postorder(F)
postorder(G) -> print G
print F
print C
print A
Output: D E B G F C A
Use: Postorder is used to delete/free a tree (children freed before parent) and to evaluate postfix expressions from an expression tree.
▪ Level-order Traversal (Breadth-First) — bonus
Visit nodes level by level using a queue.
import java.util.*;
void levelOrder(Node root) {
if (root == null) return;
Queue<Node> q = new LinkedList<>();
q.add(root);
while (!q.isEmpty()) {
Node cur = q.poll();
System.out.print(cur.data + " ");
if (cur.left != null) q.add(cur.left);
if (cur.right != null) q.add(cur.right);
}
}
Output for sample tree: A B C D E F G
Summary of all four on the sample tree
| Traversal | Result |
|---|---|
| Preorder | A B D E C F G |
| Inorder | D B E A C G F |
| Postorder | D E B G F C A |
| Level-order | A B C D E F G |
Iterative inorder (using an explicit stack)
Recursion uses the call stack internally. We can do it manually:
import java.util.*;
void inorderIterative(Node root) {
Stack<Node> stack = new Stack<>();
Node cur = root;
while (cur != null || !stack.isEmpty()) {
while (cur != null) { // go as far left as possible
stack.push(cur);
cur = cur.left;
}
cur = stack.pop(); // process
System.out.print(cur.data + " ");
cur = cur.right; // then go right
}
}
Reconstructing a tree from traversals (classic exam question)
- Inorder + Preorder uniquely determines a tree.
- Inorder + Postorder uniquely determines a tree.
- Preorder + Postorder alone do NOT (ambiguous), unless the tree is full.
Example: Given Inorder = D B E A C and Preorder = A B D E C:
- Preorder's first element
Ais the root. - In inorder, everything left of
A(D B E) is the left subtree, right ofA(C) is the right subtree. - Recurse: next preorder element
Bis root of left subtree; in inorderDis left of B,Eis right of B. - Result:
A
/ \
B C
/ \
D E
6.7 Binary Search Tree (BST)
A Binary Search Tree is a binary tree that satisfies the BST property at every node:
Left subtree values < node value < Right subtree values (and both subtrees are themselves BSTs, with no duplicate keys in the basic version).
Example BST:
50
/ \
30 70
/ \ / \
20 40 60 80
- Everything left of 50 (30,20,40) is < 50; everything right (70,60,80) is > 50. ✔
- This holds recursively at 30, 70, etc.
Key consequence: An inorder traversal of a BST yields sorted order: 20 30 40 50 60 70 80.
Why BST? Searching is like binary search on an array, but the structure is dynamic:
- Average case search/insert/delete: O(log n) (balanced).
- Worst case (skewed tree, e.g. inserting sorted data): O(n) — this is why AVL trees in 6.9 exist.
Java node + class skeleton:
class Node {
int key;
Node left, right;
Node(int key) { this.key = key; }
}
public class BST {
Node root;
}
▪ Insertion
Idea: Start at the root. If the new key is smaller, go left; if larger, go right. When you reach a null link, place the new node there.
Algorithm:
insert(node, key):
if node == null: return new Node(key) // found the spot
if key < node.key: node.left = insert(node.left, key)
else if key > node.key: node.right = insert(node.right, key)
// key == node.key -> duplicate, ignore (or handle as you wish)
return node
Java:
Node insert(Node node, int key) {
if (node == null) return new Node(key);
if (key < node.key) node.left = insert(node.left, key);
else if (key > node.key) node.right = insert(node.right, key);
return node; // unchanged node pointer goes back up
}
void insert(int key) { root = insert(root, key); }
Tracing — insert sequence 50, 30, 70, 20, 40, 60:
Insert 50: tree empty -> 50 becomes root
50
Insert 30: 30 < 50 -> go left (null) -> place
50
/
30
Insert 70: 70 > 50 -> go right (null) -> place
50
/ \
30 70
Insert 20: 20 < 50 left -> 20 < 30 left (null) -> place
50
/ \
30 70
/
20
Insert 40: 40 < 50 left -> 40 > 30 right (null) -> place
50
/ \
30 70
/ \
20 40
Insert 60: 60 > 50 right -> 60 < 70 left (null) -> place
50
/ \
30 70
/ \ /
20 40 60
▪ Search
Algorithm:
search(node, key):
if node == null: return false // not found
if key == node.key: return true // found
if key < node.key: return search(node.left, key)
else: return search(node.right, key)
Java (recursive + iterative):
boolean search(Node node, int key) {
if (node == null) return false;
if (key == node.key) return true;
return key < node.key ? search(node.left, key)
: search(node.right, key);
}
boolean searchIterative(int key) {
Node cur = root;
while (cur != null) {
if (key == cur.key) return true;
cur = (key < cur.key) ? cur.left : cur.right;
}
return false;
}
Tracing — search 40 in the tree above:
40 vs 50 -> 40 < 50 -> go left to 30
40 vs 30 -> 40 > 30 -> go right to 40
40 vs 40 -> equal -> FOUND (3 comparisons)
Tracing — search 45:
45 vs 50 -> left to 30
45 vs 30 -> right to 40
45 vs 40 -> right to null -> NOT FOUND
▪ Deletion
Deletion has three cases, in increasing difficulty.
Case 1 — Node is a leaf (no children): simply remove it (set parent link to null).
Delete 20:
30 30
/ --> \
20 40 40
Case 2 — Node has one child: replace the node with its single child.
Delete 30 (it has one child 40 after 20 removed):
50 50
/ --> /
30 40
\
40
Case 3 — Node has two children: This is the tricky one.
- Find the node's inorder successor = smallest value in its right subtree (go right once, then keep going left). (Equivalently, use the inorder predecessor = largest in left subtree.)
- Copy the successor's value into the node.
- Delete the successor from the right subtree (it has at most one child, so it reduces to Case 1 or 2).
Delete 50 (two children) from:
50 60
/ \ / \
30 70 ===> 30 70
/ \ / \ / \ \
20 40 60 80 20 40 80
Inorder successor of 50 = 60 (smallest in right subtree).
Copy 60 into root, then delete original 60 (a leaf).
Algorithm:
delete(node, key):
if node == null: return null
if key < node.key: node.left = delete(node.left, key)
else if key > node.key: node.right = delete(node.right, key)
else: // found the node
if node.left == null: return node.right // Case 1/2
if node.right == null: return node.left // Case 2
// Case 3: two children
successor = minValue(node.right)
node.key = successor
node.right = delete(node.right, successor)
return node
minValue(node):
while node.left != null: node = node.left
return node.key
Java:
Node delete(Node node, int key) {
if (node == null) return null;
if (key < node.key) {
node.left = delete(node.left, key);
} else if (key > node.key) {
node.right = delete(node.right, key);
} else {
// node to be deleted found
if (node.left == null) return node.right; // 0 or 1 (right) child
if (node.right == null) return node.left; // 1 (left) child
// two children: get inorder successor (smallest in right subtree)
int successor = minValue(node.right);
node.key = successor; // copy value
node.right = delete(node.right, successor); // delete successor
}
return node;
}
int minValue(Node node) {
int min = node.key;
while (node.left != null) { node = node.left; min = node.key; }
return min;
}
void delete(int key) { root = delete(root, key); }
Complete runnable BST program
public class BSTDemo {
static class Node {
int key; Node left, right;
Node(int k){ key = k; }
}
Node root;
void insert(int key){ root = insert(root, key); }
Node insert(Node n, int key){
if (n == null) return new Node(key);
if (key < n.key) n.left = insert(n.left, key);
else if (key > n.key) n.right = insert(n.right, key);
return n;
}
boolean search(int key){
Node c = root;
while (c != null){
if (key == c.key) return true;
c = key < c.key ? c.left : c.right;
}
return false;
}
void delete(int key){ root = delete(root, key); }
Node delete(Node n, int key){
if (n == null) return null;
if (key < n.key) n.left = delete(n.left, key);
else if (key > n.key) n.right = delete(n.right, key);
else {
if (n.left == null) return n.right;
if (n.right == null) return n.left;
int s = minValue(n.right);
n.key = s;
n.right = delete(n.right, s);
}
return n;
}
int minValue(Node n){ int m = n.key; while(n.left!=null){n=n.left; m=n.key;} return m; }
void inorder(Node n){ if(n==null)return; inorder(n.left); System.out.print(n.key+" "); inorder(n.right); }
public static void main(String[] args){
BSTDemo t = new BSTDemo();
int[] data = {50,30,70,20,40,60,80};
for (int x : data) t.insert(x);
System.out.print("Inorder (sorted): "); t.inorder(t.root); System.out.println();
System.out.println("Search 40: " + t.search(40)); // true
System.out.println("Search 45: " + t.search(45)); // false
t.delete(50); // delete root (two children)
System.out.print("After deleting 50: "); t.inorder(t.root); System.out.println();
}
}
/* Expected output:
Inorder (sorted): 20 30 40 50 60 70 80
Search 40: true
Search 45: false
After deleting 50: 20 30 40 60 70 80
*/
| Operation | Average | Worst (skewed) |
|---|---|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
6.8 Tree Height, Level and Depth
These three measures are often confused. Use this tree:
A <- Level 0 / Depth 0
/ \
B C <- Level 1 / Depth 1
/ \
D E <- Level 2 / Depth 2
/
F <- Level 3 / Depth 3
- Level of a node = number of edges from the root to that node. Root is level 0.
- level(A)=0, level(B)=1, level(D)=2, level(F)=3.
- Depth of a node = same as level — edges from the root down to the node.
- depth(F)=3.
- Height of a node = number of edges on the longest downward path from the node to a leaf.
- height(F)=0 (it is a leaf), height(B)=2 (B→E→F), height(A)=3.
- Height of the tree = height of the root = 3 here.
- Depth of the tree = max depth of any node = 3 (same number; just measured from opposite ends).
Memory hook: Depth counts down from the top; Height counts up from the bottom. For a single empty tree height is usually defined as −1, and a single node has height 0.
Java — height (recursive):
int height(Node n) {
if (n == null) return -1; // empty tree: -1 (some texts use 0)
int lh = height(n.left);
int rh = height(n.right);
return 1 + Math.max(lh, rh);
}
Tracing height(A):
height(A) = 1 + max(height(B), height(C))
height(B) = 1 + max(height(D), height(E))
height(D) = 1 + max(-1,-1) = 0
height(E) = 1 + max(height(F), -1)
height(F) = 1 + max(-1,-1) = 0
height(E) = 1 + max(0,-1) = 1
height(B) = 1 + max(0,1) = 2
height(C) = 1 + max(-1,-1)= 0
height(A) = 1 + max(2,0) = 3 ✔
Java — depth of a target node (edges from root):
int depth(Node n, int key, int d) {
if (n == null) return -1;
if (n.key == key) return d;
int left = depth(n.left, key, d + 1);
if (left != -1) return left;
return depth(n.right, key, d + 1);
}
// call: depth(root, key, 0)
Useful relations for n nodes:
- Maximum nodes at level
L=2^L. - A tree of height
hhas at most2^(h+1) − 1nodes. - Minimum possible height for
nnodes =⌈log₂(n+1)⌉ − 1(i.e. a balanced tree); maximum =n − 1(a skewed tree).
6.9 Balanced Tree: AVL Balanced Trees
The problem with plain BSTs
If we insert sorted data 10, 20, 30, 40, 50 into a BST, it degenerates into a right-skewed list:
10
\
20
\
30
\
40
\
50
Search now costs O(n) — no better than a linked list. We need to keep the tree balanced.
What is an AVL Tree?
An AVL tree (named after inventors Adelson-Velsky and Landis, 1962) is a self-balancing BST. For every node it maintains the Balance Factor (BF) invariant:
BF(node) = height(left subtree) − height(right subtree), and BF ∈ {−1, 0, +1}.
If after an insertion or deletion any node's BF becomes +2 or −2, the tree is rebalanced by rotations.
Balanced (BF shown) Unbalanced (BF = +2 at 30)
20(0) 30(+2)
/ \ /
10(0) 30(0) 20(+1)
/
10(0)
The four rotation cases
There are exactly four imbalance patterns, named by the path from the unbalanced node to the newly inserted node.
1. Left-Left (LL) → single Right rotation
Inserted into the left subtree of the left child. BF of node = +2, BF of its left child = +1.
z(+2) y(0)
/ / \
y(+1) Right Rotate x z
/ ----------->
x
Right rotation about z: y becomes the new root, z becomes y's right child, and y's old right subtree (T2) becomes z's left child.
Node rightRotate(Node z) {
Node y = z.left;
Node T2 = y.right;
y.right = z; // rotate
z.left = T2; // hand over the orphaned subtree
updateHeight(z); // update z first (now lower)
updateHeight(y);
return y; // new subtree root
}
2. Right-Right (RR) → single Left rotation
Inserted into the right subtree of the right child. Mirror of LL.
z(-2) y(0)
\ / \
y(-1) Left Rotate z x
\ ---------->
x
Node leftRotate(Node z) {
Node y = z.right;
Node T2 = y.left;
y.left = z;
z.right = T2;
updateHeight(z);
updateHeight(y);
return y;
}
3. Left-Right (LR) → Left rotate child, then Right rotate node
Inserted into the right subtree of the left child.
z(+2) z(+2) x(0)
/ / / \
y(-1) L(y) x R(z) y z
\ --> / -->
x y
First left-rotate y, converting it to an LL case, then right-rotate z.
4. Right-Left (RL) → Right rotate child, then Left rotate node
Mirror of LR: first right-rotate the right child, then left-rotate the node.
Choosing the rotation (decision rule)
After inserting key, walk back up; at the first node with |BF| = 2:
if BF = +2 and key < node.left.key -> LL (rightRotate)
if BF = +2 and key > node.left.key -> LR (leftRotate left child, then rightRotate)
if BF = -2 and key > node.right.key -> RR (leftRotate)
if BF = -2 and key < node.right.key -> RL (rightRotate right child, then leftRotate)
Complete AVL insertion program (Java)
public class AVLTree {
static class Node {
int key, height;
Node left, right;
Node(int k){ key = k; height = 1; } // new node height = 1 (leaf)
}
Node root;
int height(Node n){ return n == null ? 0 : n.height; }
int max(int a,int b){ return a>b?a:b; }
void updateHeight(Node n){ n.height = 1 + max(height(n.left), height(n.right)); }
int balanceFactor(Node n){ return n == null ? 0 : height(n.left) - height(n.right); }
Node rightRotate(Node z){
Node y = z.left, T2 = y.right;
y.right = z; z.left = T2;
updateHeight(z); updateHeight(y);
return y;
}
Node leftRotate(Node z){
Node y = z.right, T2 = y.left;
y.left = z; z.right = T2;
updateHeight(z); updateHeight(y);
return y;
}
Node insert(Node node, int key){
// 1. normal BST insert
if (node == null) return new Node(key);
if (key < node.key) node.left = insert(node.left, key);
else if (key > node.key) node.right = insert(node.right, key);
else return node; // no duplicates
// 2. update height of this ancestor
updateHeight(node);
// 3. get balance factor and rebalance
int bf = balanceFactor(node);
if (bf > 1 && key < node.left.key) return rightRotate(node); // LL
if (bf < -1 && key > node.right.key) return leftRotate(node); // RR
if (bf > 1 && key > node.left.key){ // LR
node.left = leftRotate(node.left);
return rightRotate(node);
}
if (bf < -1 && key < node.right.key){ // RL
node.right = rightRotate(node.right);
return leftRotate(node);
}
return node; // already balanced
}
void insert(int key){ root = insert(root, key); }
void preorder(Node n){ if(n==null)return; System.out.print(n.key+" "); preorder(n.left); preorder(n.right); }
public static void main(String[] args){
AVLTree t = new AVLTree();
int[] keys = {10, 20, 30, 40, 50, 25};
for (int k : keys) t.insert(k);
System.out.print("Preorder of balanced AVL: ");
t.preorder(t.root); // 30 20 10 25 40 50
}
}
Full tracing — insert 10, 20, 30, 40, 50, 25
Insert 10, 20: balanced so far.
10
\
20
Insert 30: node 10 now has BF = −2 (RR case). Left-rotate about 10:
10(-2) 20
\ RR / \
20 --> 10 30
\
30
Insert 40: goes right of 30. Tree still balanced.
20
/ \
10 30
\
40
Insert 50: node 30 gets BF = −2 (RR). Left-rotate about 30:
20 20
/ \ / \
10 30(-2) RR 10 40
\ --> / \
40 30 50
\
50
Insert 25: goes left of 30. Now node 20 has BF = −2 and 25 < 40 (its right child path goes left first) → RL case.
20(-2)
/ \
10 40(+1)
/ \
30 50
/
25
Step A — right-rotate the right child (40):
20(-2)
/ \
10 30
\
40
/ \
25* 50 (* 25 attaches under 40's left after rotation arranging)
Step B — left-rotate about 20 → final balanced tree:
30
/ \
20 40
/ \ \
10 25 50
Preorder = 30 20 10 25 40 50. ✔ matches program output. Every node now has BF in {−1,0,+1}.
AVL properties & performance
- Height of an AVL tree with
nnodes is always O(log n) (≈ 1.44 log₂ n in the worst case). - Search, insert, delete are all guaranteed O(log n).
- Each insertion needs at most one single or double rotation; deletion may need rotations at multiple levels up to the root.
- More rigidly balanced than Red-Black trees → faster lookups, but slightly more rotations on insert/delete.
6.10 The Huffman Algorithm
Huffman coding is a greedy, lossless data-compression technique invented by David Huffman (1952). It assigns shorter binary codes to frequently occurring characters and longer codes to rare ones, minimizing the total number of bits.
It produces a prefix code: no code is a prefix of another, so the encoded stream can be decoded unambiguously without separators.
The greedy idea
- Count the frequency of each character.
- Treat each character as a leaf node carrying its frequency.
- Repeatedly take the two nodes with the smallest frequencies, merge them into a new internal node whose frequency is their sum, and put it back.
- Continue until a single tree (the Huffman tree) remains.
- Assign 0 to every left edge and 1 to every right edge; the code of a character is the path from root to its leaf.
Algorithm (pseudocode)
buildHuffman(chars[], freq[]):
create a min-priority-queue PQ of leaf nodes (key = frequency)
while PQ has more than one node:
left = PQ.extractMin()
right = PQ.extractMin()
merged = new Node('$', left.freq + right.freq) // internal node
merged.left = left
merged.right = right
PQ.insert(merged)
root = PQ.extractMin()
generateCodes(root, "") // left += "0", right += "1"
Worked example — string "BCAADDDCCACACAC"
Step 1 — frequencies:
A : 5
B : 1
C : 6
D : 3
Total characters = 15
Step 2 — build the tree (always merge two smallest):
Initial leaves (sorted): B(1) D(3) A(5) C(6)
Merge B(1)+D(3) = N1(4):
N1(4)
/ \
B(1) D(3)
Queue now: N1(4) A(5) C(6)
Merge N1(4)+A(5) = N2(9):
N2(9)
/ \
N1(4) A(5)
/ \
B(1) D(3)
Queue now: C(6) N2(9)
Merge C(6)+N2(9) = ROOT(15):
ROOT(15)
/ \
C(6) N2(9)
/ \
N1(4) A(5)
/ \
B(1) D(3)
Step 3 — assign codes (left=0, right=1):
(15)
0 / \ 1
C(6) (9)
0 / \ 1
(4) A(5)
0 / \ 1
B(1) D(3)
| Char | Freq | Code | Bits |
|---|---|---|---|
| C | 6 | 0 |
1 |
| A | 5 | 11 |
2 |
| B | 1 | 100 |
3 |
| D | 3 | 101 |
3 |
Step 4 — total cost:
Bits = 6*1 + 5*2 + 1*3 + 3*3 = 6 + 10 + 3 + 9 = 28 bits.
Fixed-length code (2 bits/char for 4 symbols) = 15 * 2 = 30 bits...
With 4 symbols actually need ceil(log2 4)=2 bits -> 30 bits.
ASCII (8 bits/char) = 15 * 8 = 120 bits.
Huffman = 28 bits => big saving over ASCII.
Decoding "100" → walk from root: 1→right(9), 0→left(4), 0→left → B. Prefix property guarantees a unique decode.
Java implementation
import java.util.*;
public class HuffmanCoding {
static class Node {
char ch; int freq;
Node left, right;
Node(char ch, int freq){ this.ch = ch; this.freq = freq; }
Node(int freq, Node l, Node r){ this.ch='$'; this.freq=freq; left=l; right=r; }
}
static void generateCodes(Node node, String code, Map<Character,String> map){
if (node == null) return;
if (node.left == null && node.right == null) { // leaf
map.put(node.ch, code.isEmpty() ? "0" : code);
return;
}
generateCodes(node.left, code + "0", map);
generateCodes(node.right, code + "1", map);
}
public static void main(String[] args){
String text = "BCAADDDCCACACAC";
// 1. count frequencies
Map<Character,Integer> freq = new HashMap<>();
for (char c : text.toCharArray())
freq.merge(c, 1, Integer::sum);
// 2. min-heap by frequency
PriorityQueue<Node> pq = new PriorityQueue<>((a,b) -> a.freq - b.freq);
for (var e : freq.entrySet())
pq.add(new Node(e.getKey(), e.getValue()));
// 3. build tree
while (pq.size() > 1){
Node left = pq.poll();
Node right = pq.poll();
pq.add(new Node(left.freq + right.freq, left, right));
}
Node root = pq.poll();
// 4. generate + print codes
Map<Character,String> codes = new HashMap<>();
generateCodes(root, "", codes);
System.out.println("Char | Freq | Code");
for (var e : freq.entrySet())
System.out.println(" " + e.getKey() + " | " + e.getValue() + " | " + codes.get(e.getKey()));
// 5. encoded length
int bits = 0;
for (char c : text.toCharArray()) bits += codes.get(c).length();
System.out.println("Encoded length = " + bits + " bits (ASCII would be " + text.length()*8 + ")");
}
}
Complexity: O(n log n) where n is the number of distinct characters (each heap operation is log n).
Applications: ZIP/GZIP, JPEG, MP3, PNG, and as the entropy-coding stage of many compression formats.
6.11 B-Tree
A B-Tree is a self-balancing search tree generalised so that a node can hold many keys and have many children. It is designed for systems that read and write large blocks of data — databases and file systems — where minimizing disk accesses matters more than CPU comparisons.
While a BST/AVL node has 1 key and ≤2 children, a B-Tree node packs many keys, so the tree stays very short (shallow) even for millions of records → fewer disk reads.
Properties of a B-Tree of order m
A B-Tree of order m (max children = m) satisfies:
- Every node has at most
mchildren and at mostm − 1keys. - Every non-root internal node has at least
⌈m/2⌉children (i.e. at least⌈m/2⌉ − 1keys). - The root has at least 1 key (at least 2 children unless it is a leaf).
- Keys inside a node are kept in sorted order.
- A node with
kkeys has exactlyk + 1children. - All leaf nodes are at the same level → the tree is always height-balanced.
Keys act as separators: child i holds all keys between key[i−1] and key[i].
Order-3 (2-3 tree) B-Tree, keys sorted, leaves on one level:
[ 17 | 35 ]
/ | \
[ 8 12 ] [ 22 30 ] [ 40 52 ]
- Keys < 17 go to the first child; between 17 and 35 to the second; > 35 to the third.
Insertion (with node splitting)
Insertion always happens at a leaf. If the leaf overflows (gets m keys), it splits: the median key moves up to the parent, and the node divides into two.
Algorithm:
insert(key):
find the leaf where key belongs (descend like a BST using ranges)
insert key into that leaf in sorted position
if leaf now has m keys (overflow):
split: median key goes up to parent; left/right halves become two nodes
if parent overflows too, split recursively up (root may split -> tree grows taller)
Worked example — build an order-3 B-Tree by inserting 10, 20, 5, 6, 12, 30, 7, 17
Order 3 ⇒ max 2 keys per node; a 3rd key forces a split (median goes up).
Insert 10: [10]
Insert 20: [10 20]
Insert 5: -> [5 10 20] overflow! median=10 goes up, split:
[10]
/ \
[5] [20]
Insert 6: 6 < 10 -> left leaf -> [5 6]
[10]
/ \
[5 6] [20]
Insert 12: 12 between 10,? -> right leaf -> [12 20]
[10]
/ \
[5 6] [12 20]
Insert 30: right leaf -> [12 20 30] overflow! median=20 up:
[10 20]
/ | \
[5 6] [12] [30]
Insert 7: 7 between 5,6? -> 7>6 -> [5 6 7] overflow! median=6 up to root:
[6 10 20] overflow root! (3 keys, order 3)
/ | | \
[5] [7] [12] [30]
Root has 3 keys -> split, median=10 becomes new root:
[10]
/ \
[6] [20]
/ \ / \
[5] [7] [12] [30]
Insert 17: 17 between 10,20 -> child [12] -> [12 17]
[10]
/ \
[6] [20]
/ \ / \
[5] [7] [12 17] [30]
Notice all leaves remain on the same level at every step — the hallmark of a B-Tree.
Searching a B-Tree
search(node, key):
i = 0
while i < node.numKeys and key > node.keys[i]: i++
if i < node.numKeys and key == node.keys[i]: return found
if node is leaf: return not found
return search(node.child[i], key) // descend into correct child
Within a node, a linear or binary scan finds the slot; then we descend into exactly one child. Height is O(log_m n), so searches need very few disk reads.
Deletion (overview)
Deletion is the mirror of insertion and must avoid underflow (a node dropping below ⌈m/2⌉ − 1 keys):
- From a leaf with enough keys: just remove it.
- Underflow: borrow a key from a sibling (rotation through the parent) if the sibling has spare keys; otherwise merge with a sibling and pull a separator key down from the parent (may cascade upward and shrink the tree's height).
- From an internal node: replace the key with its inorder predecessor/successor (from a leaf), then fix any resulting leaf underflow.
B-Tree vs Binary Search Tree
| Aspect | BST / AVL | B-Tree |
|---|---|---|
| Keys per node | 1 | up to m−1 (many) |
| Children per node | ≤ 2 | up to m |
| Height for n keys | O(log₂ n) | O(log_m n) — much shorter |
| Designed for | In-memory | Disk / database blocks |
| Balance | AVL via rotations | Always balanced via split/merge |
B+ Tree (note)
A common variant: in a B+ Tree all actual data records live in the leaves, internal nodes hold only keys for routing, and the leaves are linked together in a list for fast range scans. This is what most relational-database indexes actually use.
Quick Revision & Exam Pointers
Must-remember formulas
- Max nodes at level
L=2^L; max nodes in height-htree =2^(h+1) − 1. - Strictly binary with
nleaves →2n − 1total nodes. - Extended tree:
ninternal nodes →n + 1external nodes;E = I + 2n. - Leaf/degree relation:
n₀ = n₂ + 1. - Array rep (root at 1): left =
2i, right =2i+1, parent =i/2.
Traversals on A(B(D,E),C(,F(G)))
- Pre =
A B D E C F G, In =D B E A C G F, Post =D E B G F C A. - Inorder of a BST = sorted order (most-asked fact).
BST deletion cases
- Leaf → remove. One child → replace with child. Two children → replace with inorder successor (min of right subtree), then delete that successor.
AVL
- Balance Factor = h(left) − h(right) ∈ {−1, 0, +1}.
- LL → right rotate; RR → left rotate; LR → left then right; RL → right then left.
Huffman
- Greedy: repeatedly merge two smallest frequencies; left = 0, right = 1; prefix code; minimizes total bits.
B-Tree
- Order
m: ≤mchildren, ≤m−1keys, ≥⌈m/2⌉children (non-root); all leaves at the same level; insert splits (median up), delete borrows/merges.
Complexity cheat-sheet
| Structure | Search | Insert | Delete |
|---|---|---|---|
| BST (avg) | O(log n) | O(log n) | O(log n) |
| BST (worst, skewed) | O(n) | O(n) | O(n) |
| AVL | O(log n) | O(log n) | O(log n) |
| B-Tree | O(log_m n) | O(log_m n) | O(log_m n) |
End of Unit VI — Trees.