Java Reference
In-Depth Information
• The lowercase letters a-z
• The digits 0-9
• The punctuation characters - _ . ! ~ * ' (and ,)
The characters : / & ? @ # ; $ + = and % may also be used, but only for their specified
purposes. If these characters occur as part of a path or query string, they and all other
characters should be encoded.
The encoding is very simple. Any characters that are not ASCII numerals, letters, or the
punctuation marks specified earlier are converted into bytes and each byte is written as
a percent sign followed by two hexadecimal digits. Spaces are a special case because
they're so common. Besides being encoded as %20, they can be encoded as a plus sign
(+). The plus sign itself is encoded as %2B. The / # = & and ? characters should be
encoded when they are used as part of a name, and not as a separator between parts of
the URL.
The URL class does not encode or decode automatically. You can construct URL objects
that use illegal ASCII and non-ASCII characters and/or percent escapes. Such characters
and escapes are not automatically encoded or decoded when output by methods such
as getPath() and toExternalForm() . You are responsible for making sure all such
characters are properly encoded in the strings used to construct a URL object.
Luckily, Java provides URLEncoder and URLDecoder classes to cipher strings in this for‐
mat.
URLEncoder
To URL encode a string, pass the string and the character set name to the URLEncod
er.encode() method. For example:
String encoded = URLEncoder . encode ( "This*string*has*asterisks" , "UTF-8" );
URLEncoder.encode() returns a copy of the input string with a few changes. Any non‐
alphanumeric characters are converted into % sequences (except the space, underscore,
hyphen, period, and asterisk characters). It also encodes all non-ASCII characters. The
space is converted into a plus sign. This method is a little overaggressive; it also converts
tildes, single quotes, exclamation points, and parentheses to percent escapes, even
though they don't absolutely have to be. However, this change isn't forbidden by the
URL specification, so web browsers deal reasonably with these excessively encoded
URLs.
Although this method allows you to specify the character set, the only such character
set you should ever pick is UTF-8. UTF-8 is compatible with the IRI specification, the
URI class, modern web browsers, and more additional software than any other encoding
you could choose.
Search WWH ::




Custom Search