Data-Driven Architecture for Big Data - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

stored in a knowledge repository (an NoSQL- or DBMS-like database) along with the algorithms

for machine learning.

3. The data is then processed through the hypothesis workflows.

4. The outputs from a hypothesis and predictive mining exercise are sent to the knowledge repository

as a collection with metatags for search criteria and associated user geographic and demographic

data.

5. Process the outputs of the hypothesis to outputs for further analysis or presentation to users.

Examples of real-life implementations of machine learning include:

●

IBM Watson

●

Amazon recommendation engine

●

Yelp ratings

●

Analysis of astronomical data

●

Human speech recognition

●

Stream analytics:

●

Credit card fraud

●

Electronic trading fraud

●

Google robot-driven vehicles

●

Predict stock rates

●

Genome classification

Using semantic libraries, metadata, and master data, along with the data collected from each itera-

tive processing, enriches the capabilities of the algorithms to detect better patterns and predict better

outcomes.

Let us see how a recommendation engine uses all the data types to create powerful and personal-

ized recommendations. We will use the Amazon website to discuss this process:

1. John Doe searches for movies on Amazon.

2. John Doe receives all the movies relevant to the title he searched for.

3. John Doe also receives recommendations and personalized offers along with the result sets.

How does the system know what else John Doe will be interested in purchasing, and how sure is the

confidence score for such a recommendation? This is exactly where we can apply the framework for

machine learning shown in Figure 11.9 ; the process is shown in Figure 11.11 .

The first step of the process is a user login or just anonymously executing a search on a website.

The search process executes and also simultaneously builds a profile for the user. The search engine

produces results that are shared to the user if needed as first-pass output, and adds them to the user

profile. As a second step, the search engine executes the personalized recommendation that provides

an optimized search result along with recommendations.

In this entire process after the first step, the rest of the search and recommendation workflow

follows the machine learning technique and is implemented with the collaborative filtering and

clustering algorithms. The user search criteria and the basic user coordinates, including the web-

site, clickstream activity, and geographical data, are all gathered as user profile data, and are inte-

grated with data from the knowledge repository of similar prior user searches. All of this data is

processed with machine learning algorithms, and multiple hypothesis results are iterated with

confidence scores and the highest score is returned as the closest match to the search. A second pass

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home