Designing a Machine Learning System - Machine Learning with Spark

Database Reference

In-Depth Information

Chapter 2. Designing a Machine Learning

System

In this chapter, we will design a high-level architecture for an intelligent, distributed ma-

chine learning system that uses Spark as its core computation engine. The problem we will

focus on will be taking the existing architecture for a web-based business and redesigning it

to use automated machine learning systems to power key areas of the business. In this

chapter, we will:

• Introduce our hypothetical business scenario

• Provide an overview of the current architecture

• Explore various ways in which machine learning systems can enhance or replace

certain business functions

• Provide a new architecture based on these ideas

A modern large-scale data environment includes the following requirements:

• It must integrate with other components of the system, especially with data collec-

tion and storage systems, analytics and reporting, and frontend applications.

• It should be easily scalable and independent of the rest of the architecture. Ideally,

this should be in the form of horizontal as well as vertical scalability.

• It should allow efficient computation in respect of the type of workload in mind,

that is machine learning and iterative analytics applications.

• If possible, it should support both batch and real-time workloads.

As a framework, Spark meets these criteria. However, we must ensure that the machine

learning systems designed on Spark also meet these criteria. There is no good in imple-

menting an algorithm that ends up having bottlenecks that cause our system to fail in terms

of one or more of these requirements.

Search WWH ::

Custom Search

Home