Database Reference
In-Depth Information
Chapter 2. Designing a Machine Learning
System
In this chapter, we will design a high-level architecture for an intelligent, distributed ma-
chine learning system that uses Spark as its core computation engine. The problem we will
focus on will be taking the existing architecture for a web-based business and redesigning it
to use automated machine learning systems to power key areas of the business. In this
chapter, we will:
• Introduce our hypothetical business scenario
• Provide an overview of the current architecture
• Explore various ways in which machine learning systems can enhance or replace
certain business functions
• Provide a new architecture based on these ideas
A modern large-scale data environment includes the following requirements:
• It must integrate with other components of the system, especially with data collec-
tion and storage systems, analytics and reporting, and frontend applications.
• It should be easily scalable and independent of the rest of the architecture. Ideally,
this should be in the form of horizontal as well as vertical scalability.
• It should allow efficient computation in respect of the type of workload in mind,
that is machine learning and iterative analytics applications.
• If possible, it should support both batch and real-time workloads.
As a framework, Spark meets these criteria. However, we must ensure that the machine
learning systems designed on Spark also meet these criteria. There is no good in imple-
menting an algorithm that ends up having bottlenecks that cause our system to fail in terms
of one or more of these requirements.
Search WWH ::




Custom Search