Databases Reference
In-Depth Information
Next, create a second users table, users_2 , and load data from the users table into this second
table. During loading, leverage an external script, occupation_mapper.py , to map occupation
integer values to their corresponding string values and load the string values into users_2 . The
code for this data transformation is as follows:
hive> CREATE TABLE users_2(
> userid INT,
> gender STRING,
> age INT,
> occupation STRING,
> zipcode STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '#'
> STORED AS TEXTFILE;
OK
Time taken: 0.359 seconds
hive> add FILE
/Users/tshanky/workspace/hadoop_workspace/hive_workspace/occupation_mapper.py;
hive> INSERT OVERWRITE TABLE users_2
> SELECT
> TRANSFORM (userid, gender, age, occupation, zipcode)
> USING 'python occupation_mapper.py'
> AS (userid, gender, age, occupation_str, zipcode)
> FROM users;
Available for
download on
Wrox.com
hive_movielens.txt
The occupation_mapper.py script is as follows:
occupation_dict = { 0: “other or not specified”,
1: “academic/educator”,
2: “artist”,
3: “clerical/admin”,
4: “college/grad student”,
5: “customer service”,
6: “doctor/health care”,
7: “executive/managerial”,
8: “farmer”,
9: “homemaker”,
10: “K-12 student”,
11: “lawyer”,
12: “programmer”,
13: “retired”,
14: “sales/marketing”,
15: “scientist”,
16: “self-employed”,
17: “technician/engineer”,
18: “tradesman/craftsman”,
19: “unemployed”,
20: “writer”
}
Available for
download on
Wrox.com
Search WWH ::




Custom Search