title

section

abstract

layout

series

publisher

issn

id

month

tex_title

firstpage

lastpage

page

order

cycles

bibtex_author

author

date

address

container-title

volume

genre

issued

pdf

extras

Learning Neural Networks with Sparse Activations

Original Papers

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.

inproceedings

Proceedings of Machine Learning Research

PMLR

2640-3498

awasthi24a

0

Learning Neural Networks with Sparse Activations

406

425

406-425

406

false

Awasthi, Pranjal and Dikkala, Nishanth and Kamath, Pritish and Meka, Raghu

given	family
Pranjal	Awasthi

given	family
Nishanth	Dikkala

given	family
Pritish	Kamath

given	family
Raghu	Meka

2024-06-30

Proceedings of Thirty Seventh Conference on Learning Theory

247

inproceedings

date-parts

2024

6

30

https://proceedings.mlr.press/v247/awasthi24a/awasthi24a.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024-06-30-awasthi24a.md

2024-06-30-awasthi24a.md

Files

2024-06-30-awasthi24a.md

Latest commit

History

2024-06-30-awasthi24a.md

File metadata and controls