Hardware Reference
In-Depth Information
8.2.9 Recovery
Lustre exploits caches on both client and server to hide disk and network
latency for performance-critical operations. Since failures can prevent these
caches from being flushed, Lustre clients and storage targets retain stateful
connections to ensure transactions that are completed but not yet committed
at the storage target are replayed transparently in the event of server failure
and subsequent restart or failover.
The client-side state contains a list of complete but uncommitted RPCs,
ordered by the server's transaction sequence number, incomplete RPCs or-
dered by the client execution ID (XID), and locks held by the client at each
storage target. When clients reconnect to the storage target after a service
interruption, these RPCs and locks are replayed to complete recovery and al-
low the storage target to continue service. Storage targets also retain a list
of currently connected clients in the lastrcvd file, and ensure this is persis-
tent for newly connected clients before their first RPC that modifies data or
metadata completes to ensure the target knows which clients may later need
to participate in recovery.
When a storage target starts up, it notifies the MGS, which in turn notifies
all clients that were connected to the last running instance of the storage
target. In the event that the MGS is not present or unresponsive, clients
discover this for themselves, albeit with substantially increased latency, by
attempting to connect to all previously configured target NIDs in turn until
successful.
The storage target then waits for all clients that were connected at the time
of failure to reconnect, replay all uncommitted RPCs, and re-acquire in-use
locks. Locks that are not currently protecting any cached state are dropped to
reduce the time needed for lock recovery. Recovery time is bounded to ensure
that unresponsive clients cannot hold up recovery indefinitely. A tighter bound
is used when the MGS is able to assist with restart notification.
Executed but uncommitted RPC replay is then conducted in strict transac-
tion order to ensure that transaction dependencies within and between clients
are respected. Replayed RPCs may also include the versions (previous trans-
action numbers) of the objects they updated to allow consistent replay of
isolated operations even if unrelated operations were lost due to concurrent
client failure (causing a gap in the transaction sequence numbers).
Next, clients replay their held locks, which also recovers the state of open
files on the MDS. Finally, clients resend incomplete RPCs in XID order. Since
these RPCs may or may not have been committed by the server before the
failure, the server guarantees idempotence by comparing the RPC XID against
that stored for the client in the lastrcvd file to determine if this operation
was previously committed. If so, the server does not execute the update and
reconstructs the RPC reply message from data saved in the lastrcvd file.
Clients that do not participate in recovery in a timely manner are
evicted from the server and their uncommitted operations are lost. Dependent
 
Search WWH ::




Custom Search