Database Reference
In-Depth Information
Protocols
There are a few higher-level protocols such as TCP, UDP, and RDS used in a typical RAC cluster. 4 This section covers
various protocols and their pros and cons.
TCP/IP is a stateful protocol. A connection between the sender and receiver must be established before sending
a segment. Transmission of every segment requires an acknowledgement (TCP ACK) before a transmission is
considered complete. For example, after sending a TCP/IP segment from one IP address and port number to another
IP address and port number, kernel waits for an acknowledgement before declaring that transmission as complete.
UDP is a stateless protocol. No existing connection is required to send a datagram. A transmission is considered
complete as soon as frames leave the network interface. No ACK required at all; it is up to the application to perform
error processing. For example, if a UDP packet is lost in transmission, RAC processes re-request the packet. The UDP
protocol layer is built upon the IP layer. However, both UDP and TCP/IP protocols have the overhead of double copy
and double buffering, as the segments can be sent only after copying the datagrams from the user space to kernel
space and received packets are processed in the kernel space and copied into user space.
The RDS (Reliable Datagram Socket) protocol requires specific hardware (InfiniBand fabric) and kernel drivers
to implement. With the RDS protocol, all error handling is offloaded to the InfiniBand fabric, and the transmission
is considered complete as soon as the frame reaches the fabric. The RDS protocol is used in the Exadata platform,
providing lower latency and lower resource usage. Similar to UDP, there is no ACK mechanism in the RDS protocol.
Further, RDS is designed as a zero-copy protocol, and the messages can be sent or received without a copy operation.
The RDS protocol does not use IP layer functions and bypasses the IP layer completely.
The UDP protocol is employed for cache fusion on Unix and Linux platforms. On Exadata platforms, the RDS
protocol is used for cache fusion. You can implement the RDS protocol on non-Exadata platforms too. At the time of
writing, InfiniBand fabric hardware and RDS kernel drivers are available in some flavors of Unix and Linux. There are
also vendor-specific protocols: for example, the LLT protocol is used for cache fusion with Veritas SFRAC.
Clusterware uses TCP/IP for heartbeat mechanism between nodes. While UDP stands for User Datagram
Protocol, it is sometimes, in a lighter vein, referred as the Unreliable Datagram Protocol. However, it does not mean
that UDP will suffer from data loss; thousands of customers using UDP in the Unix platform are proof that UDP
doesn't affect the reliability of an application. In essence, the UDP protocol is as reliable as the network underneath.
While there are subtle differences between UDP and other protocols, UDP processing is simpler and easier to
explain. In Figure 9-3 , a UDP function stack is shown on a Linux platform. It is not necessary to understand the details
of these function calls (after all, this chapter is not about network programming); just a high-level understanding of
function execution flow is good enough. Application processes call udp_send system call; udp_sendmsg calls IP layer
functions; IP layer function calls the device driver functions, recursively. On the receiving side, kernel threads call IP
layer functions and then UDP layer functions, and then the application process is scheduled in the CPU to drain the
socket buffers to application buffers.
4 Other protocols may be in use in a third-party cluster. For example, a RAC database uses LLT protocol in a Veritas SFRAC cluster.
 
Search WWH ::




Custom Search