Java Reference
In-Depth Information
Validating an e‐mail address
Before working on a regular expression to match e‐mail addresses, you need to look at the types of
valid e‐mail addresses you can have. For example:
someone@mailserver.com
someone@mailserver.info
someone.something@mailserver.com
someone.something@subdomain.mailserver.com
someone@mailserver.co.uk
someone@subdomain.mailserver.co.uk
someone.something@mailserver.co.uk
someone@mailserver.org.uk
some.one@subdomain.mailserver.org.uk
Also, if you examine the SMTP RFC ( http://www.ietf.org/rfc/rfc0821.txt ) , you can have the
following:
someone@123.113.209.32
"""Paul Wilton"""@somedomain.com
That's quite a list, and it contains many variations to cope with. It's best to start by breaking it
down. First, note that the latter two versions are exceptionally rare and not provided for in the
regular expression you'll create.
Second, you need to break up the e‐mail address into separate parts. Let's look at the part after the
@ symbol first.
Validating a Domain Name
Everything has become more complicated since Unicode domain names have been allowed.
However, the e‐mail RFC still doesn't allow these, so let's stick with the traditional definition of
how a domain can be described using ASCII. A domain name consists of a dot‐separated list of
words, with the last word being between two and four characters long. It was often the case that if a
two‐letter country word was used, there would be at least two parts to the domain name before it: a
grouping domain ( .co , .ac , and so on) and a specific domain name. However, with the advent of the
.tv names, this is no longer the case. You could make this very specific and provide for the allowed
top‐level domains (TLDs), but that would make the regular expression very large, and it would be
more productive to perform a DNS lookup instead.
Each part of a domain name must follow certain rules. It can contain any letter or number
or a hyphen, but it must start with a letter. The exception is that, at any point in the domain
name, you can use a # , followed by a number, which represents the ASCII code for that letter,
 
 
Search WWH ::




Custom Search