Java Reference
In-Depth Information
You saw in Chapter 4 that you could tokenize a string using the split() method for a String object. As
I mentioned then, the split() method does this by applying a regular expression — in fact, the first argu-
ment to the method is interpreted as a regular expression. This is because the expression text.split(str,
limit) , where text is a String variable, is equivalent to the expression:
Pattern.compile(str).split(text, limit)
This means that you can apply all of the power of regular expressions to the identification of delimiters
in the string. To demonstrate that this is the case, I will repeat the example from Chapter 4, but modify the
first argument to the split() method so only the words in the text are included in the set of tokens.
TRY IT OUT: Extracting the Words from a String
Here's the code for the modified version of the example:
public class StringTokenizing {
public static void main(String[] args) {
String text =
"To be or not to be, that is the question."; // String
to segment
String delimiters = "[^\\w]+";
// Analyze the string
String[] tokens = text.split(delimiters);
// Output the tokens
System.out.println("Number of tokens: " + tokens.length);
for(String token : tokens) {
System.out.println(token);
}
}
}
StringTokenizing.java
Now you should get the following output:
Number of tokens: 10
To
be
or
not
to
be
that
is
the
Search WWH ::




Custom Search