Databases Reference
In-Depth Information
4. Being the “data scientist” often involves speaking to people who
aren't also data scientists, so it would be ideal to have a set of com‐
munication strategies for getting to the information you need
about the data. Can you think of any other people you should talk
to?
5. Most of you are not “domain experts” in real estate or online
businesses.
• Does stepping out of your comfort zone and figuring out how
you would go about “collecting data” in a different setting give
you insight into how you do it in your own field?
• Sometimes “domain experts” have their own set of vocabulary.
Did Doug use vocabulary specific to his domain that you didn't
understand (“comps,” “open houses,” “CPC”)? Sometimes if you
don't understand vocabulary that an expert is using, it can pre‐
vent you from understanding the problem. It's good to get in
the habit of asking questions because eventually you will get to
something you do understand. This involves persistence and is
a habit to cultivate.
6. Doug mentioned the company didn't necessarily have a data strat‐
egy. There is no industry standard for creating one. As you work
through this assignment, think about whether there is a set of best
practices you would recommend with respect to developing a data
strategy for an online business, or in your own domain.
Sample R code
Here's some sample R code that takes the Brooklyn housing data in
the preceding exercise, and cleans and explores it a bit. (The exercise
asks you to do this for Manhattan.)
# Author: Benjamin Reddy
require ( gdata )
bk <- read.xls ( "rollingsales_brooklyn.xls" , pattern = "BOROUGH" )
head ( bk )
summary ( bk )
bk $ SALE.PRICE.N <- as.numeric ( gsub ( "[^[:digit:]]" , "" ,
bk $ SALE.PRICE ))
count ( is.na ( bk $ SALE.PRICE.N ))
names ( bk ) <- tolower ( names ( bk ))
Search WWH ::




Custom Search