Database Reference
In-Depth Information
mathematics and statistics. However, there are several well-understood machine learn-
ing techniques that are practical and that lend themselves well to distributed data
systems. This chapter provides an overview of some of the terms and concepts in this
field and an introduction to popular open-source tools used in machine learning for
large datasets.
Can Machines Predict the Future?
No, machines are not able to predict the future. Unfortunately, there seems to be
widespread misconception among those newly dabbling in data science about the
potential for predictive models to magically emerge from datasets. If you think that
you can simply collect a huge amount of data and expect a machine to provide you
with accurate visions of the future, then you might as well turn back now.
Wait—you're still here? Okay, since you haven't skipped this chapter after read-
ing my warning, I'll tell you something a bit more optimistic. Machines can report to
you about the probability that relates to an existing mathematical model. Using this
probability, the machine can also take incoming, unknown data and classify, cluster,
and even recommend an action based solely on the model. Finally, the machine can
incorporate this new data into the model, improving the entire system. Do you think
this is not as exciting as predicting the future? Computers are able to do things that
help us make decisions based on information we already know. In fact, a computer can
obviously do a lot of these types of tasks much better than a human can, and, in some
cases, these approaches are the only way to solve large-scale data problems.
Finding ways in which computers can generate models without having to be
explicitly programmed is known as machine learning (often abbreviated ML —not
to be confused with markup language ), a growing and vibrant research space. Com-
puter machine learning models are part of yet another field that has grown up along
with the rise of accessible distributed data processing systems such as Hadoop. It's one
thing to build predictive models using a small amount of data on a desktop machine;
it's altogether another to be able to deal with the volumes of data being generated by
Web applications, huge e-commerce sites, online publications, and spambots. Machine
learning tools will not relieve you of your need to understand the how and why of
predictive analytics or statistics. However, when applied correctly, they can be very
effective tools for gaining practical value from huge volumes of data.
Challenges of Machine Learning
This chapter began with a warning about trusting the effectiveness of machine learning.
Like many other technologies featured in this topic, complex machine learning tools
are becoming more accessible to everyday analysts. There is quite a lot of debate in the
data field about how this may affect data solutions. Some worry that increased acces-
sibility will cause an increase in poorly applied machine learning techniques. Because of
 
 
 
Search WWH ::




Custom Search