# Information Theory

Our interest in Information theory has been motivated by communication problems and the need to understand fundamental limits imposed on them. These limits also yield insight into the structure of optimal schemes which can be used to develop efficient and practical algorithms. Some of our work on this topic has been on the following aspects.

To avoid repetition we have described other topics in information theory under the following links.

• Approximation approach to network information theory can be found here.
• Information theoretic characterizations for wireless (relay) networks can be found here.
• Rate-distortion characterization of network data compression can be found here.
• Some information-theoretic work related to space-time wireless communications can be found here.

## Robust Communications

In wireless communication we often have co-channel interference which is additive but not necessarily Gaussian. We could have partial knowledge about its statistical behaviour such as knowledge of its covariance up to a certain lag. Motivated by this we considered information transmission over covariance-constrained additive noise channels. This was formulated as a game-theoretic problem where the signal design is playing against nature with the mutual information as the pay-off.

Game-theoretic problems for average power constrained additive noise channels have a rich history dating back to the 1950s (Blachman, Dobrushin, Pinsker and others). More recently, coding theorems under mismatched decoding for such channels have been proved (Csiszar, Hughes, Lapidoth, Narayan and others). It has been shown that (1/2) log(1+P/N) where P and N are the input and noise power constraints respectively, can be achieved for such noise with average power constraints. We studied the question of whether the Gaussian codebook and Gaussian noise are saddlepoints for more general additive noise channels. In particular, we show that this is true if the signal and noise covariances lie in closed convex and bounded sets. For average power constrained additive noise channels, the worst noise turned out to be white Gaussian noise, which is also the maximum entropy noise. A question we asked was whether this was true for covariance-constrained noise as well. We studied this in the context where we have a banded covariance constraint (covariance lags specified up to a certain lag). In this case it is well known from Burg's maximum entropy theorem (Cover and Thomas, Elements of Information Theory, 1991), that a Gaussian AR process satisfying the covariance lag constraints is the maximum entropy noise process. However, we showed that it is the worst noise only if the signal has sufficient input power and for lower power the maximum entropy noise is not the worst noise. We also demonstrated that we can attach an operational significance to the rates under mismatched decoding. Therefore the minimax strategy is Gaussian noise with the minimax covariance and a random Gaussian signal set corresponding to the waterfilling solution to this noise covariance.

Narrowband interference (NBI) could occur in transmission media such as twisted pair or coaxial cable. In many situations, radio frequency interference (RFI) from short-wave radio, citizen's band (CB) radio, and amateur (HAM) radio is the most serious source of NBI. We analyzed the effect of such interference on the data throughput for finite-blocklength transmission over noisy inter-symbol interference channels. It was shown that the worst narrowband interference spreads its power over the “sweet spots” of the signal (i.e. where the signal puts highest power). More precisely, the auto-correlation matrix of worst-case narrowband (rank-deficient) interference is shown to have the same eigendirections as the signal. Moreover, if the rank of the covariance matrix of the NBI is M<N (where N is the rank of the signal and M is dimension of the space) then the worst interferer is shown to put its power along the M largest eigendirections of the signal. We derived the exact spectral shape of the worst narrowband interference and its associated minimum channel block throughput. Note that this is not a game-theoretic problem, where the signal adjusts to the NBI. By analyzing the worst-case behavior of such interference, we can design schemes that guarantee reliable transmission even under such impairments.

### Papers

Abstract:
This paper started with a simple question: is the maximum entropy noise the worst noise for additive channels? In the case of scalar channels with a power constraint on the noise, this is true, as is well known. However, we show that in the vector case, with covariance constraints, the answer is yes and no. Yes, if the transmit power is large enough and no otherwise. Along the way we give a solution to the mutual information game with covariance constraints and show that Gaussian solutions form saddle points, but there could also be other saddlepoints. We also demonstrate that the information rates can be achieved using mismatched (Gaussian) decoders.
Abstract:
Narrowband interference (NBI) could occur in transmission media such as twisted pair or coaxial cable. We analyzed the effect of such interference on the data throughput for finite-blocklength transmission over noisy inter-symbol interference channels. It was shown that the worst narrowband interference spreads its power over the “sweet spots” of the signal (i.e. where the signal puts highest power). More precisely, the auto-correlation matrix of worst-case narrowband (rank-deficient) interference is shown to have the same eigendirections as the signal. Moreover, if the rank of the covariance matrix of the NBI is M
3. S. N. Diggavi and T. M. Cover, "Is Maximum Entropy Noise the Worst?," in Proc. International Symposium on Information Theory, Ulm, 1997, p. 278.

## Finite-buffer queues and deletion channels

Queues form an integral part of packet-switched (IP) networks, being at switches and routers both in the core and edge networks. Pioneering work (Anantharam and Verdu, IEEE Trans. IT, January 1996) has demonstrated that the capacity of infinite-buffer queues can be achieved by coding over inter-packet times. However they also showed that inter-packet coding is sufficient (without the necessity of coding for packet timing) when the packet size is large enough.

The main manifestation of errors in networks are due to packet losses at a finite-buffer queue. We studied the information-carrying capacity of finite queues, allowing for inter-packet coding, but without coding in packet timing. In the presence of side-information about which packets were lost (such as through a packet sequence number mechanism), this problem is reduced to an erasure channel with significant memory and the capacity in this case is directly related to the long-term loss probability of the queue. We showed that even when there is significant memory in the erasures, feedback does not increase the capacity.

Transport protocols such as TCP use sequence numbers in the packet (strictly speaking, in the TCP header) to detect lost packets. Packets sent by the source are numbered sequentially. The destination can infer the loss of one or several packets from the sequence number of the packet following the loss. The sequence number uses up a certain number of bits (32 bits in the case of TCP/IP) to detect lost packets and to request retransmission of those packets. We can view it as a code that converts the deletion channel into an erasure channel.

A fundamental question therefore arises: if we do not assume a-priori the existence of sequence numbers, what is the capacity of the resulting channel? This question naturally leads to the deletion channel, which essentially differs from the erasure channel in that the destination receives no explicit symbol indicating loss of a packet. Instead, the received sequence of symbols is shorter than the original sequence, with deleted symbols simply removed, i.e. only a subsequence of the transmitted symbols is received. The characterization of the capacity of the deletion channel is a long-standing open question since the 1960s. We provided the tightest lower bound to this problem since that time. Our work has revived interest in this problem, as evidenced by several follow-up pieces of work and recently Drinea and Mitzenmacher have improved our lower bounds. One of the surprising results we found was that the achievable rate in deletion channels differs from that of erasure channels by a small amount (at most 1 bit). This suggests that the use of sequence numbers to detect deletions is quite inefficient, though it makes the job of coding for such errors much simpler. In doing this analysis we studied questions on common subsequences between Markov processes, a topic perhaps of independent interest. More recently we have developed the tightest known upper bounds for the binary deletion channel.

### Papers

Abstract:
This paper was motivated by the transmission of information over finite buffer channels. Analyzing the capacity of such channels naturally led us to the study of deletion channels. In deletion channels, in contrast to erasure channels, symbols of the transmitted sequence are removed, and the receiver does not know which symbols have been deleted. This was a classic problem from the 1960s in the context of modeling synchronization errors. This paper gave the best known (at the time of publication) lower bounds to capacity of deletion channels since the 1960s. It also studied erasure channels with arbitrary memory in the erasure process. It showed that even when the erasure process has memory, feedback does not increase the capacity.
Abstract:
We present upper bounds on the capacity of the i.i.d. binary deletion channel, where each bit is independently deleted with a fixed probability d. We provide a general approach that gives a numerical answer for fixed d, and provide an argument that gives asymptotic upper bounds as d goes to 1. These appear to be the first non-trivial upper bounds for this probabilistic deletion channel.
3. S. N. Diggavi and M. Grossglauser, "Bounds on capacity of deletion channels," in Proc. IEEE International Symposium on Information Theory (ISIT), Lausanne, 2002, p. 421.
Abstract:
This paper develops the best known (at the time of publication) capacity lower bounds to the deletion channel since the 1960s. It also makes a connection between deletion channels and the well studied problem of the length of the longest common subsequence. It shows that the capacity of the deletion channel for large alphabet size is close to the corresponding erasure channel, though the coding schemes for the two channels are drastically different.
5. S. N. Diggavi and M. Grossglauser, "Information transmission over a finite buffer channel," in Proc. IEEE International Symposium on Information Theory (ISIT),Sorrento, Italy, 2000, p. 52.

## Multicarrier communications

In block transmission a guard sequence (prefix) is sent to prevent inter-block interference. A cyclic prefix is used in OFDM and an all-zero sequence is used in vector coding (Kasturia et al, T-IT, July 1989).The question we asked is, what should such a sequence be to maximize throughput under the constraint that the sequence itself carries no new information. This question was answered recently where the optimal guard sequence was shown to be a linear combination of the information symbols, with the particular optimal combination characterized (depending on the channel and the signal power). Both vector coding and DMT are special cases of this transmission scheme and significant gains in throughput over DMT can be obtained by using such a guard sequence optimization.

The ISI multiple access (ISI-MA) channel has been studied by Cheng and Verdu (T-IT, 1993) and the two user achievable rate region was characterized. Motivated by this, we proposed a Discrete Multi-Tone (DMT) based multiple access scheme for the ISI-MA channel. The advantages of using DMT-based multiple access schemes are due to the simplicity of the receiver and the ease of transmit spectral shaping. We also proposed a numerical rate optimization algorithm which converges deterministically in O(N^2) steps (and this can be easily improved to O(Nlog(N)) steps) for a two-user channel, where N is the number of DMT tones. Simple heuristics for rate allocation for a larger number of users was also explored and its performance was studied.