Designing Real-Time Streaming Architectures - Real-Time Analytics

Database Reference

In-Depth Information

in the event of a failure. MongoDB, a data store, also uses a similar system

for its replication implementation. In both cases, automatic failover is

offered, although a client may need to disconnect and reconnect to

successfully identify the new master. It is also the client's responsibility to

handle any in-flight edits thathadnotyetbeen acknowledged bythemaster.

The other approach, often found in NoSQL data stores, is to attempt a

masterless form high availability. Like the master-slave configuration, any

edits are written to multiple servers in a distributed pool of machines.

Typically, a machine is chosen as the “primary” write machine according to

some feature of the data, such as the value of the primary key being written.

The primary then writes the same value to a number of other machines

in a manner that can be determined from the key value. In this way, the

primary is always tried first and, if unavailable, the other machines are tried

in order for both writing and reading. This basic procedure is implemented

bytheCassandradatastorediscussedinthis topic.Itisalsocommontohave

the client software implement this multiple writing mechanism, a technique

commonly used in distributed hash tables to ensure high availability. The

drawback of this approach is that, unlike the master-slave architecture,

recovery of a server that has been out of service can be complicated.

Low Latency

For most developers, “low latency” refers to the time it takes to service a

given connection. Low latency is generally desirable because there are only

so many seconds in a day (86,400 as it happens) and the less time it takes to

handle a single request, the more requests a single-machine can service.

For the real-time streaming application, low latency means something a

bit different. Rather than referring to the return time for a single request,

it refers to the amount of time between an event occurring somewhere at

the “edge” of the system and it being made available to the processing and

delivery frameworks. Although not explicitly stated, it also implies that the

variation between the latency of various events is fairly stable.

Forexample,inabatchsystemoperatingontheorderofminutesthelatency

of some events is very low, specifically, those events that entered the batch

to be processed right before the processing started. Events that entered the

batch just after the start of a processing cycle will have a very high latency

because they need to wait for the next batch to be processed. For practical

reasons, many streaming systems also work with what are effectively

Search WWH ::

Custom Search

Home