Graphics Reference
In-Depth Information
lowercase ( a-z ), uppercase (A-Z), numeric (0-9), underscore, period, or
dash. Whatever is matched is replaced with nothing ( "" ).
"".join(x …) starts with an empty string and then successively adds
a new character x , going through each character in the line and
normalizing the character from Unicode to ASCII when the character is
not of type Mn (non-spacing mark). This line is an example of finding a
good solution on stackoverflow.com . It is much nicer solution to
substitute characters with a close ASCII character, rather than just
dropping out accented characters using a regular expression. Simply
searching in a web browser for programming questions (such as
“python remove accents from string”) will often return useful code
snippets showing how other programmers have solved similar issues. In
this example, the web search returned http://stackoverflow.com/
questions/517923/
what-is-the-best-way-to-remove-accents-in-a-python-unicode-string .
Running this script across the sample e-mail data shown previously results
in the following output:
From, To, CC, Date, Size
"Joe", "Zoe", "Tim", 12/09/2014, 156kb
"Joe", "Ben", "Ann; Tim; Zoe", 11/09/2014, 2048kb
"Joe", "Tim", "Ben; Zoe", 11/09/2014, 805kb
"Joe", , "Ben", 11/01/2014, 22kb
Extracting a Set of Nodes from a Link Data Set
Sometimes a data set may consist of only a set of links. Network logs are one
examplediscussedearlierinChapter3,“Data—Collect,Clean,andConnect.”
Another example is the following data set of writers and their influences was
extracted from dbpedia.org :
subject influence
Frank Herbert Edgar Rice Burroughs
Frank Herbert H. G. Wells
Frank Herbert Jules Verne
J. G. Ballard William S. Burroughs
J. G. Ballard Alfred Jarry
...
Search WWH ::




Custom Search