On Linear Mode Connectivity up to Permutation of Hidden Neurons in Neural Network : When does Weight Averaging work?

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Neural networks trained using gradient-based optimization methods exhibit a surprising phenomenon known as mode connectivity, where two independently trained network weights are not isolated low loss minima in the parameter space. Instead, they can be connected by simple curves along which the loss remains low. In case of linear mode connectivity up to permutation, even linear interpolations of the trained weights incur low loss when networks that differ by permutation of their hidden neurons are considered equivalent. While some recent research suggest that this implies existence of a single near-convex loss basin to which the parameters converge, others have empirically shown distinct basins corresponding to different strategies to solve the task. In some settings, averaging multiple network weights naively, without explicitly accounting for permutation invariance still results in a network with improved generalization. In this thesis, linear mode connectivity among a set of neural networks independently trained on labelled datasets, both naively and upon reparameterization to account for permutation invariance is studied. Specifically, the effect of hidden layer width on the connectivity is empirically evaluated. The experiments are conducted on a two dimensional toy classification problem, and the insights are extended to deeper networks trained on handwritten digits and images. It is argued that accounting for permutation of hidden neurons either explicitly or implicitly is necessary for weight averaging to improve test performance. Furthermore, the results indicate that the training dynamics induced by the optimization plays a significant role, and large model width alone may not be a sufficient condition for linear model connectivity.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)