Java Reference
In-Depth Information
Serializing and deserializing a
Sub
instance produces a corrupt copy. Why? Looking at the program
will not tell you, because the real source of the problem lies elsewhere. It is caused by the
readObject
method of the class
HashSet
. Under certain circumstances, this method can indirectly
invoke an overridden method on an uninitialized object. In order to populate the hash set that is
being deserialized,
HashSet.readObject
calls
HashMap.put
, which in turn calls
hashCode
on each
key. Because a whole object graph is being deserialized at once, there is no guarantee that each key
has been completely initialized when its
hashCode
method is invoked. In practice, this is rarely an
issue, but occasionally it causes utter chaos. The bug is tickled by certain cycles in the object graph
that is being deserialized.
To make this more concrete, let us look at what happens when we deserialize the
Sub
instance in
the program. First, the serialization system deserializes the
Super
fields of the
Sub
instance. The
only such field is
set
, which contains a reference to a
HashSet
. Internally, each
HashSet
instance
contains a reference to a
HashMap
, whose keys are the hash set's elements. The
HashSet
class has a
readObject
method that creates an empty
HashMap
and inserts a key-value mapping for each
element in the set, using the map's
put
method. This method calls
hashCode
on the key to determine
its bucket. In our program, the sole key in the hash map is the
Sub
instance whose
set
field is
currently being deserialized. The subclass field of this instance,
id
, has yet to be initialized, so it
contains 0, the initial value assigned to all
int
fields. Unfortunately, the
hashCode
method in
Sub
returns this value instead of 666, which will eventually be stored in this field.
Because
hashCode
returns the wrong value, the entry for the key-value mapping is placed in the
wrong bucket. By the time the
id
field is initialized to 666, it is too late. Changing the value of this
field once the
Sub
instance is in the
HashMap
corrupts it, which corrupts the
HashSet
, which
corrupts the
Sub
instance. The program detects this corruption and throws an appropriate error.
This program illustrates that the serialization system as a whole, which includes the
readObject
method of
HashMap
, violates the rule that you must not invoke an overridable method of a class
from its constructor or pseudoconstructor [EJ Item 15]. The (default)
readObject
method of the
class
Super
invokes the (explicit)
readObject
method of the class
HashSet
, which invokes the
put
method on its internal
HashMap
, which invokes the
hashCode
method on the
Sub
instance that is
currently in the process of creation. Now we are in big trouble: The
hashCode
method that
Super
inherits from
Object
is overridden in
Sub
, and this overridden method executes before the
initialization of the
Sub
field, on which it depends.
This failure is nearly identical in nature to the one in
Puzzle 51
. The only real difference is that in
this puzzle, the
readObject
pseudoconstructor is at fault instead of the constructor. The
readObject
methods of
HashMap
and
Hashtable
are similarly affected.
For platform implementers, it may be possible to fix this problem in
HashSet, HashMap
, and
HashTable
at a slight performance penalty. The strategy, as it applies to
HashSet
, is to rewrite the
readObject
method to store the set's elements in an array instead of putting them in the hash set at
deserialization time. Then, on the first invocation of a public method on the deserialized hash set,
the elements in the array would be inserted into the set before executing the method.
The cost of this approach is that it requires checking whether to populate the hash set on entry to
each of its public methods. Because
HashSet, HashMap
, and
HashTable
are all performance-
critical, this approach seems undesirable. It is unfortunate that all users would have to pay the cost,
Search WWH ::
Custom Search