Database Reference
In-Depth Information
to submit viable addresses to the geocoder. Here are some typical addresses that our regular
expression should match:
3509 N. Lee St.
2120-2128 E. Allegheny Ave.
7601 Crittenden St., #E-10
370 Tomlinson Place
2311 N. 33rd St.
6822-24 Old York Rd.
335 W. School House Lane
These are not addresses and should not be matched:
2,700 sq. ft. BRT# 124077100 Improvements: Residential Property
</b> C.P. June Term, 2009 No. 00575 &nbsp; &nbsp;
R has built-in functions that allow the use of Perl-type regular expressions. For more info on
regular expressions, see Mastering Regular Expressions (O'Reilly) and Regular Expression
Pocket Reference (O'Reilly).
With some minor deletions to clean up address idiosyncrasies, we should be able to correctly
identify street addresses from the mess of other data contained in properties.html. We'll use a
single regular expression pattern to do the cleanup. For clarity, we can break the pattern into
the familiar elements of an address (number, name, suffix)
> stNum<-"^[0-9]{2,5}(\\-[0-9]+)?"
> stName<-"([NSEW]\\. )?[0-9A-Z ]+"
> stSuf<-"(St|Ave|Place|Blvd|Drive|Lane|Ln|Rd)(\\.?)$"
> myStPat<-paste(stNum,stName,stSuf,sep=" ")
Note the backslash characters themselves must be escaped with a backslash to avoid conflict
with R syntax. Let's test this pattern against our examples using R's grep() function:
> grep(myStPat,"6822-24 Old York
Rd.",perl=TRUE,value=FALSE,ignore.case=TRUE)
[1] 1
> grep(myStPat,"2,700 sq. ft. BRT# 124077100 Improvements:
Residential Property",
perl=TRUE,value=FALSE,ignore.case=TRUE)
integer(0)
The result, [1] 1 , shows that the first of our target address strings matched; we tested only
one string at a time. We also have to omit strings that we don't want with our address, such as
extra punctuation (like quotes or commas), or sheriff's office designations that follow street
names:
Search WWH ::




Custom Search