Java Reference
In-Depth Information
Content-Type: application/xml; charset=iso-2022-jp
In this case, getContentType() returns the full value of the Content-type field, including
the character encoding. You can use this to improve on Example 7-1 by using the en‐
coding specified in the HTTP header to decode the document, or ISO-8859-1 (the HTTP
default) if no such encoding is specified. If a nontext type is encountered, an exception
is thrown. Example 7-2 demonstrates.
Example 7-2. Download a web page with the correct character set
import java.io.* ;
import java.net.* ;
public class EncodingAwareSourceViewer {
public static void main ( String [] args ) {
for ( int i = 0 ; i < args . length ; i ++) {
try {
// set default encoding
String encoding = "ISO-8859-1" ;
URL u = new URL ( args [ i ]);
URLConnection uc = u . openConnection ();
String contentType = uc . getContentType ();
int encodingStart = contentType . indexOf ( "charset=" );
if ( encodingStart != - 1 ) {
encoding = contentType . substring ( encodingStart + 8 );
}
InputStream in = new BufferedInputStream ( uc . getInputStream ());
Reader r = new InputStreamReader ( in , encoding );
int c ;
while (( c = r . read ()) != - 1 ) {
System . out . print (( char ) c );
}
r . close ();
} catch ( MalformedURLException ex ) {
System . err . println ( args [ 0 ] + " is not a parseable URL" );
} catch ( UnsupportedEncodingException ex ) {
System . err . println (
"Server sent an encoding Java does not support: " + ex . getMessage ());
} catch ( IOException ex ) {
System . err . println ( ex );
}
}
}
}
public int getContentLength()
The getContentLength() method tells you how many bytes there are in the content. If
there is no Content-length header, getContentLength() returns -1. The method throws
Search WWH ::




Custom Search