**The direct application of Shannon’s theory,** as early as Shannon’s own work [166], provided the capacity measure now often called ergodic capacity [19, 24, 62]. The ergodic capacity is the maximum achievable time-average rate of reliable communication, and it is appropriate when the receiver observes a typical sequence of channel states during the reception of a codeword. As the codeword length is proportional to the receiver’s decoding delay, one can also say that ergodic capacity is an appropriate metric when the channel variation is fast relative to the delay an application can tolerate.

The quasistatic channel model that is the limiting case of very slow fading has given rise to two additional capacity formulations, namely delay limited capacity [69] and capacity versus outage probability [24, 144] (see [19] for a comprehensive survey). Delay limited capacity is the maximum rate at which the instantaneous mutual information can be kept constant over all states of the fading process. Capacity versus outage probability was introduced for block interference fading channels, and an outage occurs if the instantaneous mutual information is less than a fixed transmission rate. Slow block-fading channel models are the subject of topic 5, where one of the important figures of merit is diversity.

**We next turn to device channel state information (CSI).** A receiver may deliberately introduce significant delay to form a more accurate estimate of the channel state from both past and future received symbols. It is common to assume that the receiver has accurate CSI as a byproduct of the symbol detection process. On the other hand, transmitter CSI is less likely to be accurate because the timely delivery of channel measurements from the receiver to the transmitter becomes increasingly difficult as the channel varies more quickly. In time-division duplex (TDD) systems, a transmitter and receiver may alternate their transmitter and receiver roles, permitting each to garner accurate CSI as the receiver. Nevertheless, in employing this CSI at the transmitter, measurement delay may degrade accuracy.

**In general**, CSI issues are complex and methods are often motivated by detailed considerations such as, e.g., how quantized CSI is embedded in feedback packets. However, for a point-to-point link, we consider four models corresponding to whether there is CSI at the transmitter (CSIT) and whether there is CSI at the receiver (CSIR). We summarize properties of ergodic capacity when the CSI is either perfect or completely unavailable; more information can be found in [22]. We define a channel with input X, output Y, and channel state H by PY|XH (in the following, we use uppercase "P" for both distributions and densities for simplicity).

• No CSIT, No CSIR

Neither device observes H, so the channel is

and has capacity maxpX(•) I(X; Y). The simplicity in the capacity formula is perhaps deceiving since the optimization over Px can be tricky. For instance, the capacity-achieving Px for power-constrained AWGN fading channels are discrete point mass distributions [1, 66, 81].

• CSIR, No CSIT

The communication system can be viewed as a channel with the usual input X but with the output pair (see [22, 174])

The capacity is thus

• CSIT, No CSIR

Shannon studied this problem with causal CSI [166]. For some applications, for example information storage, the channel state sequence {Hi}™=1 may even be known noncausally [60, 71]. In the latter case, the capacity is max[1 (U; Y) — I(U; H)] where the maximization is over all Pux |H for which U — [X, H] — Y forms a Markov chain [60].

• CSIT, CSIR

In this case, the instantaneous capacity for state h is

and the ergodic capacity is E[C(H)] (see [22, Prop. 2]). Let Pi = E[|Xi|2] be chosen as a function P(Hi) of Hi. To identify the transmission policy that maximizes E[C(H)] subject to the average power constraint

we observe that a sequence of channel uses in time provides the same mutual information as would those same channel uses in parallel Gaussian channels (see Section 2.2.5). The optimal PX|H(-|h) thus corresponds to a "waterfilling" power allocation analogous to (2.30), i.e., we choose

The water filling level Q is chosen to satisfy (3.12) with equality; see [22, 62].