Database Reference
In-Depth Information
elements by using the
.
for each level of nesting (e.g.,
toplevel.nextlevel
). You can
access array elements in SQL by specifying the index with
[
element
]
, as shown in
Example 9-27
.
Example 9-27. SQL query nested and array elements
select
hashtagEntities
[
0
].
text
from
tweets
LIMIT
1
;
From RDDs
In addition to loading data, we can also create a SchemaRDD from an RDD. In Scala,
RDDs with case classes are implicitly converted into SchemaRDDs.
For Python we create an RDD of
Row
objects and then call
inferSchema()
, as shown
in
Example 9-28
.
Example 9-28. Creating a SchemaRDD using Row and named tuple in Python
happyPeopleRDD
=
sc
.
parallelize
([
Row
(
name
=
"holden"
,
favouriteBeverage
=
"coffee"
)])
happyPeopleSchemaRDD
=
hiveCtx
.
inferSchema
(
happyPeopleRDD
)
happyPeopleSchemaRDD
.
registerTempTable
(
"happy_people"
)
With Scala, our old friend implicit conversions handles the inference of the schema
for us (
Example 9-29
).
Example 9-29. Creating a SchemaRDD from case class in Scala
case
class
HappyPerson
(
handle
:
String
,
favouriteBeverage
:
String
)
...
// Create a person and turn it into a Schema RDD
val
happyPeopleRDD
=
sc
.
parallelize
(
List
(
HappyPerson
(
"holden"
,
"coffee"
)))
// Note: there is an implicit conversion
// that is equivalent to sqlCtx.createSchemaRDD(happyPeopleRDD)
happyPeopleRDD
.
registerTempTable
(
"happy_people"
)
With Java, we can turn an RDD consisting of a serializable class with public getters
and setters into a schema RDD by calling
applySchema()
, as
Example 9-30
shows.
Example 9-30. Creating a SchemaRDD from a JavaBean in Java
class
HappyPerson
implements
Serializable
{
private
String
name
;
private
String
favouriteBeverage
;
public
HappyPerson
()
{}
public
HappyPerson
(
String
n
,
String
b
)
{
name
=
n
;
favouriteBeverage
=
b
;
}
public
String
getName
()
{
return
name
;
}