Java Reference
In-Depth Information
Serializing and deserializing a Sub instance produces a corrupt copy. Why? Looking at the program
will not tell you, because the real source of the problem lies elsewhere. It is caused by the
readObject method of the class HashSet . Under certain circumstances, this method can indirectly
invoke an overridden method on an uninitialized object. In order to populate the hash set that is
being deserialized, HashSet.readObject calls HashMap.put , which in turn calls hashCode on each
key. Because a whole object graph is being deserialized at once, there is no guarantee that each key
has been completely initialized when its hashCode method is invoked. In practice, this is rarely an
issue, but occasionally it causes utter chaos. The bug is tickled by certain cycles in the object graph
that is being deserialized.
To make this more concrete, let us look at what happens when we deserialize the Sub instance in
the program. First, the serialization system deserializes the Super fields of the Sub instance. The
only such field is set , which contains a reference to a HashSet . Internally, each HashSet instance
contains a reference to a HashMap , whose keys are the hash set's elements. The HashSet class has a
readObject method that creates an empty HashMap and inserts a key-value mapping for each
element in the set, using the map's put method. This method calls hashCode on the key to determine
its bucket. In our program, the sole key in the hash map is the Sub instance whose set field is
currently being deserialized. The subclass field of this instance, id , has yet to be initialized, so it
contains 0, the initial value assigned to all int fields. Unfortunately, the hashCode method in Sub
returns this value instead of 666, which will eventually be stored in this field.
Because hashCode returns the wrong value, the entry for the key-value mapping is placed in the
wrong bucket. By the time the id field is initialized to 666, it is too late. Changing the value of this
field once the Sub instance is in the HashMap corrupts it, which corrupts the HashSet , which
corrupts the Sub instance. The program detects this corruption and throws an appropriate error.
This program illustrates that the serialization system as a whole, which includes the readObject
method of HashMap , violates the rule that you must not invoke an overridable method of a class
from its constructor or pseudoconstructor [EJ Item 15]. The (default) readObject method of the
class Super invokes the (explicit) readObject method of the class HashSet , which invokes the put
method on its internal HashMap , which invokes the hashCode method on the Sub instance that is
currently in the process of creation. Now we are in big trouble: The hashCode method that Super
inherits from Object is overridden in Sub , and this overridden method executes before the
initialization of the Sub field, on which it depends.
This failure is nearly identical in nature to the one in Puzzle 51 . The only real difference is that in
this puzzle, the readObject pseudoconstructor is at fault instead of the constructor. The
readObject methods of HashMap and Hashtable are similarly affected.
For platform implementers, it may be possible to fix this problem in HashSet, HashMap , and
HashTable at a slight performance penalty. The strategy, as it applies to HashSet , is to rewrite the
readObject method to store the set's elements in an array instead of putting them in the hash set at
deserialization time. Then, on the first invocation of a public method on the deserialized hash set,
the elements in the array would be inserted into the set before executing the method.
The cost of this approach is that it requires checking whether to populate the hash set on entry to
each of its public methods. Because HashSet, HashMap , and HashTable are all performance-
critical, this approach seems undesirable. It is unfortunate that all users would have to pay the cost,
 
 
Search WWH ::




Custom Search