Statistical Inference, Exploratory Data Analysis, and the Data Science Process - Doing Data Science

Databases Reference

In-Depth Information

Each one represents one (simulated) day's worth of ads shown and

clicks recorded on the New York Times home page in May 2012. Each

row represents a single user. There are five columns: age, gender

(0=female, 1=male), number impressions, number clicks, and logged-

in.

You'll be using R to handle these data. It's a programming language

designed specifically for data analysis, and it's pretty intuitive to start

using. You can download it here . Once you have it installed, you can

load a single file into R with this command:

data1 <- read.csv ( url ( "http://stat.columbia.edu/~rachel/

datasets/nyt1.csv" ))

Once you have the data loaded, it's time for some EDA:

1. Create a new variable, age_group , that categorizes users as "<18" ,

"18-24" , "25-34" , "35-44" , "45-54" , "55-64" , and "65+" .

2. For a single day:

• Plot the distributions of number impressions and click-

through-rate (CTR=# clicks/# impressions) for these six age

categories.

• Define a new variable to segment or categorize users based on

their click behavior.

• Explore the data and make visual and quantitative comparisons

across user segments/demographics (<18-year-old males ver‐

sus < 18-year-old females or logged-in versus not, for example).

• Create metrics/measurements/statistics that summarize the da‐

ta. Examples of potential metrics include CTR, quantiles, mean,

median, variance, and max, and these can be calculated across

the various user segments. Be selective. Think about what will

be important to track over time—what will compress the data,

but still capture user behavior.

3. Now extend your analysis across days. Visualize some metrics and

distributions over time.

4. Describe and interpret any patterns you find.

Sample code

Here we'll give you the beginning of a sample solution for this exercise.

The reality is that we can't teach you about data science and teach you

Search WWH ::

Custom Search

Home