Intuition of information theoryI am reading the book "Elements of Information Theory" by Cover and Thomas and I am having trouble understanding conceptually the various ideas.For example, I know that H(X) can be interpreted as the average encoding length. But what does H ( Y | X ) intuitively mean?And what is mutual information? I read things like "It is the reduction in the uncertainty of one random variable due to the knowledge of the other". This doesn't mean anything to me as it doesn't help me explain in words why I ( X ; Y ) = H ( Y ) − H ( Y | X ). Or explain the chain rule for mutual information.I also encountered the Data processing inequality explained as something that can be used to show that no clever manipulation of the data can improve the inferences that can be made from the data. I ( X ; Y ) ≥ I ( X ; Z ). If I had to explain this result to someone in words and explain why it should be intuitively true I would have absolutely no idea what to say. Even explaining how "data processing" is related to a markov chains and mutual information would baffle me.I can imagine explaining a result in algebraic topology to someone since there is usually an intuitive geometric picture that can be drawn. But with information theory if I had to explain a result to someone at comparable level to a picture I would not be able to.When I do problems its just abstract symbolic manipulations and trial and error. I am looking for an explanation (not these blah gives information about blah explanations) of the various terms that will make the solutions to problems appear in a meaningful way.Right now I feel like someone trying to do algebraic topology purely symbolically without thinking about geometric pictures.Is there a book that will help my curse?

Question

Intuition of information theoryI am reading the book &quot;Elements of Information Theory&quot; by Cover and Thomas and I am having trouble understanding conceptually the various ideas.For example, I know that H(X) can be interpreted as the average encoding length. But what does   H  (  Y      |    X  ) intuitively mean?And what is mutual information? I read things like &quot;It is the reduction in the uncertainty of one random variable due to the knowledge of the other&quot;. This doesn&#039;t mean anything to me as it doesn&#039;t help me explain in words why   I  (  X  ;  Y  )  =  H  (  Y  )  −  H  (  Y      |    X  ). Or explain the chain rule for mutual information.I also encountered the Data processing inequality explained as something that can be used to show that no clever manipulation of the data can improve the inferences that can be made from the data.   I  (  X  ;  Y  )  ≥  I  (  X  ;  Z  ). If I had to explain this result to someone in words and explain why it should be intuitively true I would have absolutely no idea what to say. Even explaining how &quot;data processing&quot; is related to a markov chains and mutual information would baffle me.I can imagine explaining a result in algebraic topology to someone since there is usually an intuitive geometric picture that can be drawn. But with information theory if I had to explain a result to someone at comparable level to a picture I would not be able to.When I do problems its just abstract symbolic manipulations and trial and error. I am looking for an explanation (not these blah gives information about blah explanations) of the various terms that will make the solutions to problems appear in a meaningful way.Right now I feel like someone trying to do algebraic topology purely symbolically without thinking about geometric pictures.Is there a book that will help my curse?

komizmtk · Accepted Answer

Christopher Olah wrote an excellent intuitive explanation of Information Theory called - Visual Information Theory. It provides thougtful visualizations for understanding these concepts.In addition there was a paper that introduced a tool for visualizing mutual information called The Mutual Information Diagram for Uncertainty Visualization that may be useful.

Briana Petty · Accepted Answer

Since you have an intuitive understanding of entropy based on the compression theorem, you should look into the operational meaning of mutual information, which is the channel coding theorem. It says if you have a noisy channel with a joint distribution p(X,Y), then it can transmit information encoded in X to a receiving party with access to Y at a rate of I(X;Y) bits per symbol.

Intuition of information theory I am reading the book "Elements of Information Theory" by Cover and

Answered question

Answer & Explanation

New Questions in Pre-Algebra