Skip to content

Latest commit

 

History

History
53 lines (53 loc) · 1.98 KB

2024-06-30-awasthi24a.md

File metadata and controls

53 lines (53 loc) · 1.98 KB
title section abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Learning Neural Networks with Sparse Activations
Original Papers
A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
awasthi24a
0
Learning Neural Networks with Sparse Activations
406
425
406-425
406
false
Awasthi, Pranjal and Dikkala, Nishanth and Kamath, Pritish and Meka, Raghu
given family
Pranjal
Awasthi
given family
Nishanth
Dikkala
given family
Pritish
Kamath
given family
Raghu
Meka
2024-06-30
Proceedings of Thirty Seventh Conference on Learning Theory
247
inproceedings
date-parts
2024
6
30