Java Reference
In-Depth Information
Lists and regular expressions
One technique is to use lists of "standard" entities along with regular expressions to identi-
fy the named entities. Named entities are sometimes referred to as proper nouns. The stand-
ard entities list could be a list of states, common names, months, or frequently referenced
locations. Gazetteers, which are lists that contain geographical information used with maps,
provide a source of location-related entities. However, maintaining such lists can be time
consuming. They can also be specific to language and locale. Making changes to the list
can be tedious. We will demonstrate this approach in the Using the ExactDiction-
aryChunker class section later in this chapter.
Regular expressions can be useful in identifying entities. Their powerful syntax provides
enough flexibility in many situations to accurately isolate the entity of interest. However,
this flexibility can also make it difficult to understand and maintain. We will demonstrate
several regular expression approaches in this chapter.
Search WWH ::




Custom Search