Databases Reference
In-Depth Information
Feature Generation or Feature Extraction
This process we just went through of brainstorming a list of features
for Chasing Dragons is the process of feature generation or feature
extraction . This process is as much of an art as a science. It's good to
have a domain expert around for this process, but it's also good to use
your imagination.
In today's technology environment, we're in a position where we can
generate tons of features through logging. Contrast this with other
contexts like surveys, for example—you're lucky if you can get a sur‐
vey respondent to answer 20 questions, let alone hundreds.
But how many of these features are just noise? In this environment,
when you can capture a lot of data, not all of it might be actually useful
information.
Keep in mind that ultimately you're limited in the features you have
access to in two ways: whether or not it's possible to even capture the
information, and whether or not it even occurs to you at all to try to
capture it. You can think of information as falling into the following
buckets:
Relevant and useful, but it's impossible to capture it.
You should keep in mind that there's a lot of information that
you're not capturing about users—how much free time do they
actually have? What other apps have they downloaded? Are they
unemployed? Do they suffer from insomnia? Do they have an
addictive personality? Do they have nightmares about dragons?
Some of this information might be more predictive of whether
or not they return next month. There's not much you can do
about this, except that it's possible that some of the data you are
able to capture serves as a proxy by being highly correlated with
these unobserved pieces of information: e.g., if they play the game
every night at 3 a.m., they might suffer from insomnia, or they
might work the night shift.
Relevant and useful, possible to log it, and you did.
Thankfully it occurred to you to log it during your brainstorming
session. It's great that you chose to log it, but just because you
chose to log it doesn't mean you know that it's relevant or useful,
so that's what you'd like your feature selection process to discover.
 
Search WWH ::




Custom Search