Java Reference
In-Depth Information
Content-Type: application/xml; charset=iso-2022-jp
In this case,
getContentType()
returns the full value of the Content-type field, including
the character encoding. You can use this to improve on
Example 7-1
by using the en‐
coding specified in the HTTP header to decode the document, or ISO-8859-1 (the HTTP
default) if no such encoding is specified. If a nontext type is encountered, an exception
is thrown.
Example 7-2
demonstrates.
Example 7-2. Download a web page with the correct character set
import
java.io.*
;
import
java.net.*
;
public
class
EncodingAwareSourceViewer
{
public
static
void
main
(
String
[]
args
)
{
for
(
int
i
=
0
;
i
<
args
.
length
;
i
++)
{
try
{
// set default encoding
String
encoding
=
"ISO-8859-1"
;
URL
u
=
new
URL
(
args
[
i
]);
URLConnection
uc
=
u
.
openConnection
();
String
contentType
=
uc
.
getContentType
();
int
encodingStart
=
contentType
.
indexOf
(
"charset="
);
if
(
encodingStart
!=
-
1
)
{
encoding
=
contentType
.
substring
(
encodingStart
+
8
);
}
InputStream
in
=
new
BufferedInputStream
(
uc
.
getInputStream
());
Reader
r
=
new
InputStreamReader
(
in
,
encoding
);
int
c
;
while
((
c
=
r
.
read
())
!=
-
1
)
{
System
.
out
.
print
((
char
)
c
);
}
r
.
close
();
}
catch
(
MalformedURLException
ex
)
{
System
.
err
.
println
(
args
[
0
]
+
" is not a parseable URL"
);
}
catch
(
UnsupportedEncodingException
ex
)
{
System
.
err
.
println
(
"Server sent an encoding Java does not support: "
+
ex
.
getMessage
());
}
catch
(
IOException
ex
)
{
System
.
err
.
println
(
ex
);
}
}
}
}
public int getContentLength()
The
getContentLength()
method tells you how many bytes there are in the content. If
there is no Content-length header,
getContentLength()
returns -1. The method throws