The Information Bottleneck : Connections to Other Problems, Learning and Exploration of the IB Curve

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Borja Rodriguez Galvez; [2019]

Keywords: ;

Abstract: In this thesis we study the information bottleneck (IB) method. This is an informationtheoretic framework which addresses the question of what are the relevant factors of arandom variable X to explain another statistically dependent random variable Y . Thesefactors are embedded into a bottleneck variable T obeying the Markov condition Y $X $ T.The contributions of the thesis are three-fold: (i) The thesis serves as a survey onthe existing connections of the information bottleneck method with rate distortion theoryand with minimal sufficient statistics, for which we also extended the known theory byproving some unproved results and deriving new connections. (ii) The thesis also servesas a survey on the information bottleneck and learning. We recover the main results onsample bounds for learning, prove them insufficient for real-world problems and show theimportance of the recently found ties between information and generalization. Moreover,we provide with a clear intuition of why the information bottleneck is a good objectivefunction for supervised learning tasks. Furthermore, we provide with a new informationtheoretic generalization bound for linear models which, to the extent of our knowledge,is the first one which does not depend on the cardinality of the random variables. (iii)Finally, the main contribution of the thesis are the results regarding the exploration of theIB curve. The IB curve is the set of points describing the solutions of the informationbottleneck optimization in terms of compression of the inputs and explainability of theoutput. We introduce the convex IB Lagrangian, an objective function which allows us toexplore the IB curve (in contrast to the previously used IB Lagrangian). Furthermore, weprove there is a bijective mapping between the Lagrange multiplier used and the obtainedpoint in the IB curve, provided the IB curve shape is known. This means one could designthe Lagrange multiplier to obtain a desired level of compression or explainability.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)