MIT and Brown University researchers have conducted a study on deep classifiers, an artificial neural network used for classification tasks such as image classification, speech recognition, and natural language processing.
The paper, "Dynamics in Deep Classifiers trained with the Square Loss: Normalization, Low Rank, Neural Collapse and Generalization Bounds," explores the dynamics of training deep classifiers with the square loss and how properties such as rank minimization, neural collapse, and dualities between the activation of neurons and the weights of the layers are intertwined.
The authors focused on two types of deep classifiers: fully connected deep networks and convolutional neural networks (CNNs). The study found that deep networks trained to fit a training dataset will eventually reach a state known as “neural collapse,” but weight decay regularization (WD), stochastic gradient descent (SGD), and weight normalization (WN) can prevent this phenomenon. The authors also prove new norm-based generalization bounds for CNNs with localized kernels which shows that generalization can be orders of magnitude better than densely connected networks.
This study provides insights into the properties that emerge during training and could advance our understanding of why deep learning works as well as it does.
The paper, "Dynamics in Deep Classifiers trained with the Square Loss: Normalization, Low Rank, Neural Collapse and Generalization Bounds," explores the dynamics of training deep classifiers with the square loss and how properties such as rank minimization, neural collapse, and dualities between the activation of neurons and the weights of the layers are intertwined.
The authors focused on two types of deep classifiers: fully connected deep networks and convolutional neural networks (CNNs). The study found that deep networks trained to fit a training dataset will eventually reach a state known as “neural collapse,” but weight decay regularization (WD), stochastic gradient descent (SGD), and weight normalization (WN) can prevent this phenomenon. The authors also prove new norm-based generalization bounds for CNNs with localized kernels which shows that generalization can be orders of magnitude better than densely connected networks.
This study provides insights into the properties that emerge during training and could advance our understanding of why deep learning works as well as it does.