The Real Culprit Behind Failed Deep Neural Network Training Is Degeneracy, Not Vanishing Gradients

The Real Culprit Behind Failed Deep Neural Network Training Is Degeneracy, Not Vanishing Gradients

Selected from severelytheoretical Translated by Machine Heart Contributors: Jiang Siyuan, Liu Xiaokun The author demonstrates through the example of deep linear networks that the reason for the poor performance of the final network is not vanishing gradients, but rather the degeneracy of the weight matrices, which reduces the effective degrees of freedom of the model, … Read more