Java Reference
In-Depth Information
10.2.2.2 The Use of Regular Expressions
Regular expressions can be deined as a set of symbols (characters) and syntactic elements that are
used to match patterns of text. For instance, a credit card number contains either 13 or 16 digits.
A basic regular expression for a credit card pattern would be [0-9]{13,16}. his is a basic pattern
that may be used to search for credit card information in the iles in the operating system or may
be used by Web applications to ilter user input. Regular expressions are extremely useful for
validating user input into the application. he process of input validation relies heavily on regular
expressions, as user input from an application can usually be categorized into particular types of
information patterns such as names, phone numbers, addresses, IP addresss, credit card numbers,
and so on. hese data types can have patterns created to match them, and the input is accepted as
valid only if the input matches the pattern given. For instance, an advanced regular expression for
a credit card would resemble something like this:
((4\d{3})|(5[1-5]\d{2})|(6011))[\s\-\.]*\d{4}[\s\-\.]*\d{4}[\s\-\.]*\d{4}|3[4,7]\d{13}$
his regular expression can match Visa, MasterCard, American Express, and Discover card patterns
and also take into consideration any whitespace, '-', or '.' characters in between the string. So if a user
enters the Visa card number 4111111111111111, then the string is matched against the regular expres-
sion; if the match proves to be right, then the application processes the data as pure; otherwise, the
application should ideally reject the input provided by the user and should require the reentry of the
same information.
Regular expressions are extremely beneicial because they prevent users who are trying to
enter malicious input into certain input ields to carry out XSS attacks or SQL injection attacks.
For instance, if a malicious user enters an input <script>alert('xss')</script> in the
username ield of an application and there is no validation, then the input would be processed as
HTML and the script would be executed. However, if the username ield was validated with the
help of a regular expression like this [a-zA-Z0-9]{4,20} , which checks for the pattern contain-
ing either uppercase or lowercase letters or numbers, with a minimum size of 4 characters and a
maximum length of 20 characters, then the application would force the user to complete the input
ield with the appropriate input. Figure 10.6 shows how the application rejects any input other
than the regular expression deined for the username ield.
10.2.2.3 Whitelist vs. Blacklist Validation
Input validation is a practice that, if wrongly implemented, could go very wrong and lull the orga-
nization into a false sense of security. We have already explored the need for input validation and
some of the practices that may be employed for performing validations, like regular expressions.
Another important factor for consideration is the type of characters that should be allowed and
disallowed as input by users in the input ields of a Web application. here are two implementa-
tion approaches to this practice—the blacklist validation technique and the whitelist validation
technique. A blacklist may be deined as a list or collection of entities that are explicitly rejected.
he blacklist validation technique is one where the developer blacklists certain words and symbols
that maybe entered in an input ield. For instance, the word script or certain special characters like
“<”, “>”, or “;” that are used extensively in XSS and SQL injection attacks against the Web applica-
tion may be rejected by the Web application. his is an approach that explicitly rejects known bad
characters.
Search WWH ::




Custom Search