Java Reference
In-Depth Information
or how to get that resource. In the physical world, it's the difference between the title
“Harry Potter and The Deathly Hallows” and the library location “Room 312, Row 28,
Shelf 7”. In Java, it's the difference between the
java.net.URI
class that only identifies
resources and the
java.net.URL
class that can both identify and retrieve resources.
The network location in a URL usually includes the protocol used to access a server
(e.g., FTP, HTTP), the hostname or IP address of the server, and the path to the resource
on that server. A typical URL looks like
http://www.ibiblio.org/javafaq/javatutori‐
al.html
. This specifies that there is a file called
javatutorial.html
in a directory called
javafaq
on the server
www.ibiblio.org
, and that this file can be accessed via the HTTP
protocol.
The syntax of a URL is:
protocol
://
userInfo
@
host
:
port
/
path
?
query
#
fragment
Here the protocol is another word for what was called the scheme of the URI.
(
Scheme
is the word used in the URI RFC.
Protocol
is the word used in the Java docu‐
mentation.) In a URL, the protocol part can be
file
,
ftp
,
http
,
https
,
magnet
,
telnet
, or
various other strings (though not
urn
).
The
host
part of a URL is the name of the server that provides the resource you want.
It can be a hostname such as
www.oreilly.com
or
utopia.poly.edu
or an IP address, such
as 204.148.40.9 or 128.238.3.21.
The
userInfo
is optional login information for the server. If present, it contains a user‐
name and, rarely, a password.
The
port
number is also optional. It's not necessary if the service is running on its default
port (port 80 for HTTP servers).
Together, the userInfo, host, and port constitute the
authority
.
The
path
points to a particular resource on the specified server. It often looks like a
filesystem path such as
/forum/index.php
. However, it may or may not actually map to
a filesystem on the server. If it does map to a filesystem, the path is relative to the
document root of the server, not necessarily to the root of the filesystem on the server.
As a rule, servers that are open to the public do not show their entire filesystem to clients.
Rather, they show only the contents of a specified directory. This directory is called the
document root, and all paths and filenames are relative to it. Thus, on a Unix server, all
files that are available to the public might be in
/var/public/html
, but to somebody con‐
necting from a remote machine, this directory looks like the root of the filesystem.
The
query
string provides additional arguments for the server. It's commonly used only
in
http
URLs, where it contains form data for input to programs running on the server.
Finally, the
fragment
references a particular part of the remote resource. If the remote
resource is HTML, the fragment identifier names an anchor in the HTML document.