Database Reference
In-Depth Information
same key pair file that you used to log in to the EC2 instance running the Data Sci‐
ence Toolbox:
$ scp -i mykey.pem ~/Desktop/logs.csv \
> ubuntu@ec2-184-73-72-150.compute-1.amazonaws.com:data
Replace the hostname in the example (the part between @ and : ) with the value you
see on the EC2 overview page in the AWS console.
Decompressing Files
If the original data set is very large or it's a collection of many files, the file may be a
(compressed) archive. Data sets which contain many repeated values (such as the
words in a text file or the keys in a JSON file) are especially well suited for
compression.
Common file extensions of compressed archives are: .tar.gz , .zip , and .rar . To decom‐
press these, you would use the command-line tools tar (Bailey, Eggert, & Poznyakoff,
2014), unzip (Smith, 2009), and unrar (Asselstine, Scheurer, & Winkelmann, 2014),
respectively. There are a few more, though less common, file extensions for which
you would need yet other tools. For example, in order to extract a file named
logs.tar.gz , you would use:
$ cd ~/book/ch03
$ tar -xzvf data/logs.tar.gz
Indeed, tar is notorious for its many options. In this case, the four options x , z , v ,
and f specify that tar should extract files from an archive, use gzip as the decompres‐
sion algorithm, be verbose , and use the file logs.tar.gz . In time you'll get used to typing
these four characters, but there's a more convenient way.
Rather than remembering the different command-line tools and their options, there's
a handy script called unpack (Brisbin, 2013), which will decompress many different
formats. unpack looks at the extension of the file that you want to decompress, and
calls the appropriate command-line tool.
The unpack tool is part of the Data Science Toolbox. Remember that you can look up
how it can be installed in Appendix A . Example 3-1 shows the source of unpack .
Although Bash scripting is not the focus of this topic, it's still useful to take a moment
to figure out how it works.
Example 3-1. Decompress various ile formats (unpack)
#!/usr/bin/env bash
# unpack: Extract common file formats
# Display usage if no parameters given
Search WWH ::




Custom Search