Database Reference
In-Depth Information
Dimensionality reduction is often regarded as being part of the exploring step. It's
useful for when there are too many features for plotting. You could do a scatter plot
matrix, but that only shows you two features at a time. It's also useful as a pre-
processing step for other machine-learning algorithms.
Most dimensionality reduction algorithms are unsupervised. This means that they
don't employ the labels of the data points in order to construct the lower-dimensional
mapping.
In this section, we'll look at two techniques: PCA, which stands for Principal Compo‐
nents Analysis (Pearson, 1901) and t-SNE, which stands for t-distributed Stochastic
Neighbor Embedding (van der Maaten & Hinton, 2008).
Introducing Tapkee
Tapkee is a C++ template library for dimensionality reduction (Lisitsyn, Widmer, &
Garcia, 2013). The library contains implementations of many dimensionality reduc‐
tion algorithms, including:
• Locally Linear Embedding
• Isomap
• Multidimensional scaling
• PCA
• t-SNE
Tapkee's website contains more information about these algorithms. Although Tapkee
is mainly a library that can be included in other applications, it also offers a
command-line tool. We'll use this to perform dimensionality reduction on our wine
data set.
Installing Tapkee
If you aren't running the Data Science Toolbox, you'll need to download and compile
Tapkee yourself. First make sure that you have CMake installed. On Ubuntu, you sim‐
ply run:
$ sudo apt-get install cmake
Consult Tapkee's website for instructions for other operating systems. Then execute
the following commands to download the source and compile it:
$ curl -sL https://github.com/lisitsyn/tapkee/archive/master.tar.gz > \
> tapkee-master.tar.gz
$ tar -xzf tapkee-master.tar.gz
$ cd tapkee-master
$ mkdir build && cd build
Search WWH ::




Custom Search