Database Reference
In-Depth Information
10
Building a Data Classification
System with Mahout
The real problem is not whether machines think but whether men do.
—B. F. Skinner
C omputers essentially perform a fairly simple set of tasks over and over. Data goes
in, algorithms are applied to that data, and results come out. In order to know what to
do, computers have to be explicitly programmed by humans. Since the beginning of
the digital computing age, scientists have pondered the possibility of computers react-
ing to changes in data without new programming, similar to how humans learn from
changes in their environment. If a computer could modify its programming models
as input changes, it could be used as a tool for helping us make decisions about the
future. And as data sizes grow beyond what humans are capable of processing manu-
ally, it becomes almost necessary to apply computer input to decision-making tasks.
When faced with an onslaught of data, can we find ways for computers to make useful
decisions for humans? The answer to this question is a resounding “Sometimes,” and
the field concerned with the ability of computers to provide accurate predictive output
based on new information is known as machine learning .
There is currently debate and confusion about which tools in the data space can
be automated to provide predictive business value given adequate data. In light of
this debate, there have been many cases in which machine learning has improved the
usefulness of everyday actions. Machine learning touches on use cases that consumers
of Internet services experience every day. From spam detection to product recom-
mendation systems to online insurance pricing, machine learning challenges are being
increasingly solved by using distributed computer systems.
Machine learning draws upon the rich academic histories of mathematics, statis-
tics, and probability. There's no simple way to sum up the variety of use cases and
approaches to solving machine learning challenges. Some research projects attempt
to model computer systems to emulate human thought processes, whereas others use
clever statistical techniques to help predict the probability for some actions to be taken.
To be proficient in machine learning techniques requires that you be well versed in
 
 
 
Search WWH ::




Custom Search