Database Reference
In-Depth Information
p(8).trim.toDouble,
p(9).trim.toDouble,
p(10).trim.toInt,
p(11).trim.toInt,
p(12).trim.toInt,
p(13).trim.toInt
))
That's quite a complicated statement, but if I break it down, it will seem simpler. The textFile option loads the
CSV file as a text file. The first map option splits the columns in the text file by comma. The next map option maps the
data columns from the previous step into the columns of the vehicle class that was just defined. So, the vehicle RDD
contains the comma-separated data from the file.
I register the vehicle RDD as a table called “vehicle” so that SQL can be executed against the table:
scala> vehicle.registerAsTable("vehicle")
At this point, the data has been imported into the RDD and the RDD has been registered as a table, so I am
ready to execute some SQL against the table. I want to select details of Aston Martin cars from the data. The following
statement creates a schema RDD called “aston” that contains the data from the SELECT statement:
scala> val aston = sql( "SELECT year, manufacturer, model, vclass, engine FROM vehicle WHERE
manufacturer = 'ASTON MARTIN' ")
The SELECT statement takes the year, manufacturer, model, class, and engine size columns from the vehicle table.
It filters the data, selecting only those where the manufacturer's name is Aston Martin.
When printed, the resulting aston schema RDD appears as a string. One line is printed for each row that is
matched from the table. The five columns from the table that match the columns in the SQL are embedded in the
results string:
scala> aston.map( t => "year: " + t(0) + " manufacturer " + t(1) + " model " + t(2) + " class " +
t(3) + " engine " + t(4) ).collect().foreach(println)
That string prints the following data:
year: 2014 manufacturer ASTON MARTIN model DB9 class MINICOMPACT engine 5.9
year: 2014 manufacturer ASTON MARTIN model RAPIDE class SUBCOMPACT engine 5.9
year: 2014 manufacturer ASTON MARTIN model V8 VANTAGE class TWO-SEATER engine 4.7
year: 2014 manufacturer ASTON MARTIN model V8 VANTAGE class TWO-SEATER engine 4.7
year: 2014 manufacturer ASTON MARTIN model V8 VANTAGE S class TWO-SEATER engine 4.7
year: 2014 manufacturer ASTON MARTIN model V8 VANTAGE S class TWO-SEATER engine 4.7
year: 2014 manufacturer ASTON MARTIN model VANQUISH class MINICOMPACT engine 5.9
Thus, the results show seven matching Aston Martin data rows with their model and class details.
From an analytics point of view, Spark SQL gives analysts SQL-based access to Spark data in memory. In
processing terms, it is much faster than traditional Map Reduce processing. Also, people with a background in
relational databases will be comfortable using SQL to interrogate their data.
 
Search WWH ::




Custom Search