Java Reference
In-Depth Information
Tokenization can be simple or complex. Here, we will demonstrate a simple tokenization
using the String class' split method. First, declare a string to hold the text that is to
be tokenized:
String text = "Mr. Smith went to 123 Washington avenue.";
The split method uses a regular expression argument to specify how the text should be
split. In the next code sequence, its argument is the string \\s+ . This specifies that one or
more whitespaces be used as the delimiter:
String tokens[] = text.split("\\s+");
A for-each statement is used to display the resulting tokens:
for(String token : tokens) {
System.out.println(token);
}
When executed, the output will appear as shown here:
Mr.
Smith
went
to
123
Washington
avenue.
In Chapter 2 , Finding Parts of Text , we will explore the tokenization process in depth.
Search WWH ::




Custom Search