Introducing Mark and Sweep
The JVM knows exactly what objects and arrays it has allocated. They'll be stored in
some sort of internal data structure, which we will refer to as the allocation table .
The JVM can also figure out which local variables in each stack frame refer to which
objects and arrays in the heap. Finally, by following references held by objects and
arrays in the heap, the JVM can trace through and find all objects and arrays are still
referred to, no matter how indirectly.
Thus, the runtime is able to determine when an allocated object is no longer
referred to by any other active object or variable. When the interpreter finds such an
object, it knows it can safely reclaim the object's memory and does so. Note that the
garbage collector can also detect and reclaim cycles of objects that refer to each
other, but are not referenced by any other active objects.
We define a reachable object to be an object that can be reached by starting from
some local variable in one of the methods in the stack trace of some application
thread, and following references until we reach the object. Objects of this type are
also said to be live . 1
There are a couple of other possibilities of where the chain of
references can start apart from local variables. The general
name for the root of a reference chain leading to a reachable
object is a GC root .
With these simple definitions, let's look at a simple method for performing garbage
collection based on these principles.
The Basic Mark and Sweep Algorithm
The usual (and simplest) algorithm for the collection process is called mark and
sweep . This occurs in three phases:
1. Iterate through the allocation table, marking each object as dead .
2. Starting from the local variables that point into the heap, follow all references
from all objects we reach. Every time we reach an object or array we haven't
seen yet, mark it as live . Keep going until we've fully explored all references we
can reach from the local variables.
3. Sweep across the allocation table again. For each object not marked as live,
reclaim the memory in heap and place it back on the free memory list. Remove
the object from the allocation table.
1 The process whereby we exhaustively explore from the GC roots produces what is known as the
transitive closure of live objects—a term that is borrowed from the abstract mathematics of graph