Database Reference
In-Depth Information
Spark SQL/HiveQL type
Scala type
Java type
Python
STRUCT<COL1:
COL1_TYPE, ...>
Row
Row
Row
The last type, structures, is simply represented as other Row s in Spark SQL. All of
these types can also be nested within each other; for example, you can have arrays of
structs, or maps that contain structs.
Working with Row objects
Row objects represent records inside SchemaRDDs, and are simply fixed-length arrays
of fields. In Scala/Java, Row objects have a number of getter functions to obtain the
value of each field given its index. The standard getter, get (or apply in Scala), takes a
column number and returns an Object type (or Any in Scala) that we are responsible
for casting to the correct type. For Boolean , Byte , Double , Float , Int , Long , Short ,
and String , there is a getType() method, which returns that type. For example, get
String(0) would return field 0 as a string, as you can see in Examples 9-12 and 9-13 .
Example 9-12. Accessing the text column (also first column) in the topTweets
SchemaRDD in Scala
val topTweetText = topTweets . map ( row => row . getString ( 0 ))
Example 9-13. Accessing the text column (also first column) in the topTweets
SchemaRDD in Java
JavaRDD < String > topTweetText = topTweets . toJavaRDD (). map ( new Function < Row , String >() {
public String call ( Row row ) {
return row . getString ( 0 );
}});
In Python, Row objects are a bit different since we don't have explicit typing. We just
access the i th element using row[i] . In addition, Python Row s support named access
to their fields, of the form row. column_name , as you can see in Example 9-14 . If you
are uncertain of what the column names are, we illustrate printing the schema in
“JSON” on page 172 .
Example 9-14. Accessing the text column in the topTweets SchemaRDD in Python
topTweetText = topTweets . map ( lambda row : row . text )
 
Search WWH ::




Custom Search