This is the lecture note of CS61B - Lecture 14.
In this lecture, we are going to focus on a data structure called Disjoint Sets or Union Find(并查集). We will see how to design it by solving the "Dynamic Connectivity" problem, and see how our underlying data structures can affect asymptotic runtime (using our formal Big-Theta notation) and code complexity.
The Disjoint Sets data structure has two operations:
- connect(x, y): Connects x and y.
- isConnected(x, y): Returns true if x and y are connected. Connections can be transitive, i.e. they don’t need to be direct.
Our goal is to implement the above specific interface.
Naive Approach
Better Approach
Quick Find
Our next step is how to do track set membership in Java, like:
1 | { 0, 1, 2, 4 }, {3, 5}, {6} |
Approach 1
Approach 2
A better approach is using array of the underlying data structures.
1 | public class QuickFindDS implements DisjointSets { |
Though connect
method will still be costly, this approach has good performance on isConnected
method.
Quick Union
In Quick Union, we will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.
See the following example to know why this approach is good at connect
.
By the way, we set root(5)'s value equal to root(2) instead of setting root(5)'s value equal to 2 is because, the latter one will cause a taller tree.
However, this approach still has performance issues, that is compared to QuickFind, we have to climb up a tree. If the tree is too tall, finding root(x) will be expensive.
1 | public class QuickUnionDS implements DisjointSets { |
Weighted Quick Union
Let's see its performance.
We used the number of items in a tree to decide upon the root.
You might wondering why not use the height of the tree? The reason is worst case performance for HeightedQuickUnionDS is asymptotically the same! Both are Θ(log(N)). And resulting code is more complicated with no performance gain.
Last Improvement: Path Compression
See an example,
We could set that items like 15, 11, 5, and 1 etc., their parents are 0. This change won't influent the truth which set each item belongs to.