Secure Coding Practices for Java Web Applications - Secure Java: For Web Application Development

Java Reference

In-Depth Information

10.2.2.2 The Use of Regular Expressions

Regular expressions can be deined as a set of symbols (characters) and syntactic elements that are

used to match patterns of text. For instance, a credit card number contains either 13 or 16 digits.

A basic regular expression for a credit card pattern would be [0-9]{13,16}. his is a basic pattern

that may be used to search for credit card information in the iles in the operating system or may

be used by Web applications to ilter user input. Regular expressions are extremely useful for

validating user input into the application. he process of input validation relies heavily on regular

expressions, as user input from an application can usually be categorized into particular types of

information patterns such as names, phone numbers, addresses, IP addresss, credit card numbers,

and so on. hese data types can have patterns created to match them, and the input is accepted as

valid only if the input matches the pattern given. For instance, an advanced regular expression for

a credit card would resemble something like this:

((4\d{3})|(5[1-5]\d{2})|(6011))[\s\-\.]*\d{4}[\s\-\.]*\d{4}[\s\-\.]*\d{4}|3[4,7]\d{13}$

his regular expression can match Visa, MasterCard, American Express, and Discover card patterns

and also take into consideration any whitespace, '-', or '.' characters in between the string. So if a user

enters the Visa card number 4111111111111111, then the string is matched against the regular expres-

sion; if the match proves to be right, then the application processes the data as pure; otherwise, the

application should ideally reject the input provided by the user and should require the reentry of the

same information.

Regular expressions are extremely beneicial because they prevent users who are trying to

enter malicious input into certain input ields to carry out XSS attacks or SQL injection attacks.

For instance, if a malicious user enters an input <script>alert('xss')</script> in the

username ield of an application and there is no validation, then the input would be processed as

HTML and the script would be executed. However, if the username ield was validated with the

help of a regular expression like this [a-zA-Z0-9]{4,20} , which checks for the pattern contain-

ing either uppercase or lowercase letters or numbers, with a minimum size of 4 characters and a

maximum length of 20 characters, then the application would force the user to complete the input

ield with the appropriate input. Figure 10.6 shows how the application rejects any input other

than the regular expression deined for the username ield.

10.2.2.3 Whitelist vs. Blacklist Validation

Input validation is a practice that, if wrongly implemented, could go very wrong and lull the orga-

nization into a false sense of security. We have already explored the need for input validation and

some of the practices that may be employed for performing validations, like regular expressions.

Another important factor for consideration is the type of characters that should be allowed and

disallowed as input by users in the input ields of a Web application. here are two implementa-

tion approaches to this practice—the blacklist validation technique and the whitelist validation

technique. A blacklist may be deined as a list or collection of entities that are explicitly rejected.

he blacklist validation technique is one where the developer blacklists certain words and symbols

that maybe entered in an input ield. For instance, the word script or certain special characters like

“<”, “>”, or “;” that are used extensively in XSS and SQL injection attacks against the Web applica-

tion may be rejected by the Web application. his is an approach that explicitly rejects known bad

characters.

Search WWH ::

Custom Search

Home