Java Reference
In-Depth Information
Now we have the entire character encoding loaded into the variable named buffer .
The first thing to check is to see if the first character is a pound sign (#). If the first character
is a pound sign, then this is an ASCII encoding. We should parse the number immediately
following the pound sign and return that as the encoded character.
String b = buffer.toString().trim().toLowerCase();
if (b.charAt(0) == '#')
{
try
{
return (char) (Integer.parseInt(b.substring(1)));
} catch (NumberFormatException e)
{
return '&';
}
If the number is invalid, and a NumberFormatException is thrown, then we
return an ampersand (&). Again, since this is an error, returning an ampersand is the best
we can do with regards to decoding the character.
If it is not an ASCII encoding, then we look up the character in the charMap , which
was setup earlier. This will give us the ASCII code for the character. For example, the string
“quot” is mapped to ASCII 34, which is the ASCII code for a quote.
} else
{
if (charMap.containsKey(b))
return charMap.get(b);
else
return '&';
}
} else
return ch;
Finally, we return the character, if the very first if-statement failed. This is because there
was no character-encoded character.
Reading Characters
The HTML parse class contains a function, named read that is called to read the next
character from an HTML file. The function will return zero if an HTML tag is encountered.
Additionally it will decode any special HTML characters.
The function begins by looking for a less-than sign. The less-than sign signals the begin-
ning of an HTML tag. If a less-than sign is found, then the parseTag method is called, and
a zero is returned. Calling the getTag function can access the tag, which was parsed by
the parseTag method.
Search WWH ::




Custom Search