paper-reading | vernacular.ai

List of papers we cover during our weekly paper reading session. For past and missing links/notes, check out the (private) wiki.

28 August, 2020

Overlapping experiment infrastructure: More, better, faster experimentation

Optimal testing in the experiment-rich regime

21 August, 2020

Bridging Anaphora Resolution as Question Answering

A framework for understanding unintended consequences of machine learning

14 August, 2020

Reformer: The Efficient Transformer

24 July, 2020

Adversarial examples that fool both computer vision and time-limited humans

Language models are few-shot learners

10 July, 2020

DIET: Lightweight Language Understanding for Dialogue Systems

DialogueRNN: An attentive RNN for emotion detection in conversations

3 July, 2020

StarSpace: Embed All The Things!

Learning Asr-Robust Contextualized Embeddings for Spoken Language Understanding

12 June, 2020

Sentence-bert: Sentence embeddings using siamese bert-networks

PyTorch: An imperative style, high-performance deep learning library

What’s Hidden in a Randomly Weighted Neural Network?

Weakly Supervised Attention Networks for Entity Recognition

Improving BERT with Self-Supervised Attention

5 June, 2020

Hierarchical attention networks for document classification

Training classifiers with natural language explanations

ERD’14: entity recognition and disambiguation challenge

Audio adversarial examples: Targeted attacks on speech-to-text

29 May, 2020

Differentiable Reasoning over a Virtual Knowledge Base

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

15 May, 2020

NBDT: Neural-Backed Decision Trees

Faster Neural Network Training with Data Echoing

Universal Language Model Fine-tuning for Text Classification

8 May, 2020

Designing and Deploying Online Field Experiments

Intelligent Selection of Language Model Training Data

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

25 April, 2020

Gmail Smart Compose: Real-Time Assisted Writing

Supervised Learning with Quantum-Inspired Tensor Networks

The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

Understanding deep learning requires rethinking generalization

17 April, 2020

Zoom In: An Introduction to Circuits

3 April, 2020

Sideways: Depth-Parallel Training of Video Models

Speech2face: Learning the face behind a voice

Dialog Methods for Improved Alphanumeric String Capture

Presents a way for dialog level collection of alpha numeric strings via an ASR. Two main ideas:

Skip listing over n-best hypothesis across turns (attempts)
Chunking and confirming pieces one by one

28 February, 2020

Self-supervised dialogue learning

The self-supervision signal here is coming from a model which tries to predict whether a provided tuple of turns is in order or not. Connecting this as the discriminator in generative-discriminative dialog systems they find better results.

7 February, 2020

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

This is an approach to collect supervision signal from deployment data. There are three tasks for the system (which is a chat bot doing ranking on candidate responses):

Dialogue. The main task. Given the turns till now, the bot ranks which response to utter.
Satisfaction. Given turns till now, last being user utterance, predict whether the user is satisfied.
Feedback. After asking for feedback from the user, predict user’s response (feedback) based on the turns till now.

The models have shared weights, mostly among task 1 and 3.

31 January, 2020

Modeling Sequences with Quantum States: A Look Under the Hood

This paper explores a new direction in language modelling. The idea is still to learn the underlying distribution of sequences of characters, but here they do it by learning the quantum analogue of the classical probability distribution function. Unlike the classical case, marginal distributions there carry enough information to re-construct the joint distribution. This is the central idea of the paper, and is explained in the first half. The second half of the paper explains the theory and implementation of the training algorithm, with a simple example. Future work would be to apply this algorithm to a more complicated example, and even adapt it to variable length sequences.

Deep voice 2: Multi-speaker neural text-to-speech

This paper suggests improvements to DeepVoice and Tacotron, and also proposes a way to add trainable speaker embeddings. The speaker embeddings are initialized randomly and trained jointly through backpropagation. The paper lists some patterns that lead to better performance

Transforming speaker embeddings to appropriate dimension and form for every place it is added to the model. The transformed speaker embeddings are called site-specific speaker embeddings
Initializing recurrent layer hidden states with the site-specific speaker embeddings.
Concatenating the site-specific speaker embedding to input at every timestep of the recurrent layer
Multiplying layer activations element-wise to the site-specific speaker embeddings

A credit assignment compiler for joint prediction

This talks about an API for framing L2S style search problems in style of an imperative program which allows for two optimizations:

memoization
forced path collapse, getting losses without going to the last state

Main reduction that happens here is to a cost-sensitive classification problem.

17 January, 2020

Learning language from a large (unannotated) corpus

Introductory paper on the general approach used in learn. The idea is to learn various generalizable syntactic and semantic relations from unannotated corpus. The relations are expressed using graphs sitting on top of link grammar and meaning text theory (MTT). While the general approach is sketched out decently enough, there are details to filled in various steps and experiments to run (as of the writing in 2014).

On another note, the document is a nice read because of the many interesting ways of looking at various ideas in understanding languages and going from syntax to reasoning via semantics.

10 January, 2020

Parsing English with a link grammar

We came to here via opencog’s learn project. This is a nice perspective setup also if you are missing out on formal introduction of grammars and all. Overall a link grammar defines connectors on left and right side of a word with disjunctions and conjunctions incorporated which then link together to form a sentence, under certain constraints.

This specific paper shows the formulation and creates a parser for English, covering many (not all) linguistics phenomena.

20 December, 2019

Generalized end-to-end loss for speaker verification

This paper is development over their previous research work, Tuple-based end to end(TE2E) loss, for speaker identification. They try to generalize the concept of the cosine similarity being used in TE2E by creating similarity matrics for utterances by a user. They have suggested two losses in the paper:

Softmax loss
Contrast loss

Both these loss functions had two components, one which brings utterances by a user together and others, which separates the utterances of different users. Out of the two, Contrast loss is more rigorous.

13 December, 2019

Towards end-to-end spoken language understanding

This paper talks about developing an end to end model for intent recognition form speech. Currently, all the models have several components like ASR and NLU, which each have some errors of their own degrading the quality of the speech to intent pipeline. Experiments for two tasks, speech to domain and speech to intent were performed using the model. The model’s architecture is mostly inspired from end to end speech synthesis models. A unique feature of the architecture is that they perform sub-sampling after the first GRU layer to reduce the size of the vector and to tackle the problem of vanishing gradient.

Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

They take a regular classifier, pick out logits before softmax and try to formulate an energy based model able to give $P(x, y)$ and $P(x)$. The formulation itself is pretty simple with the energy function being $E(x) = −LogSumExp_yf_Θ(x)[y]$. Final loss sums cross entropy (for discriminative part) and negative log likelhood of $P(x)$ approximated using SGLD. Check out the repo here.

Although the learning mechanism is a little fragile and needs work to be generally stable, the results are neat.

29 November, 2019

Overton: A Data System for Monitoring and Improving Machine-Learned Products

This is more about managing supervision than model. There are 3 problems that they are trying to solve:

Fine grained quality monitoring,
Support for multi-component pipelines, and
Updating supervision

For this, they make easy to use abstractions for describing supervision and developing models. They also do a lot of multitask learning and snorkelish weak supervision, including the recent slicing abstractions for fine grained quality control.

While you have to adapt a few pieces for your own case (and scale), Overton is a nice testimony for success of things like weak supervision and higher level development abstractions in production.

Slice-based learning: A programming model for residual learning in critical data slices

This is taking the snorkel’s labelling function idea to group data instances in slices, segments which are interesting to us from an overall quality perspective. These slicing functions are important not only for identifying and narrowing down to specific kinds of data instances but also for learning slice specific representations which works out as computationally cheap way (there are other benefits too) of replicating a Mixture of Experts style model.

Like with labelling functions, we have the slice membership predicted using heuristics which are noisy. This membership value along with slice representations (and slice prediction confidences) help create the slice aware representation to be used for the final task. The appendix has few good examples of slicing functions.

21 September, 2019

Moody, C. E., Mixing dirichlet topic models and word embeddings to make lda2vec, arXiv preprint arXiv:1605.02019, (), (2016). (cite:moody2016mixing)
Ren, L., Xie, K., Chen, L., & Yu, K., Towards universal dialogue state tracking, arXiv preprint arXiv:1810.09587, (), (2018). (cite:ren2018towards)
Coucke, A., Saade, A., Ball, A., Th'eodore Bluche, Caulier, A., Leroy, D., Cl'ement Doumouro, …, Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces, CoRR, abs/1805.10190(), (2018). (cite:DBLP:journals/corr/abs-1805-10190)

3 August, 2019

Kim, S., Eriksson, T., Kang, H., & Hee Youn, D., A pitch synchronous feature extraction method for speaker recognition, In , (pp. ) (2004). : . (cite:PSMFCC)
Chen, J., Elements of human voice (2016), : . (cite:HumanVoice)
Ghorbani, A., & Zou, J., Data shapley: equitable valuation of data for machine learning, arXiv preprint arXiv:1904.02868, (), (2019). (cite:ghorbani2019data)
Shen, G., Horikawa, T., Majima, K., & Kamitani, Y., Deep image reconstruction from human brain activity, PLoS computational biology, 15(1), 1006633 (2019). (cite:shen2019deep)
Daum'e III, Hal, Frustratingly easy domain adaptation, arXiv preprint arXiv:0907.1815, (), (2009). (cite:daume2009frustratingly)

27 July, 2019

Belkin, M., Hsu, D., Ma, S., & Mandal, S., Reconciling modern machine learning and the bias-variance trade-off, arXiv preprint arXiv:1812.11118, (), (2018). (cite:belkin2018reconciling)

20 July, 2019

Locatello, F., Bauer, S., Lucic, M., Gelly, S., Sch"olkopf, Bernhard, & Bachem, O., Challenging common assumptions in the unsupervised learning of disentangled representations, arXiv preprint arXiv:1811.12359, (), (2018). (cite:locatello2018challenging)

13 July, 2019

Advani, M. S., & Saxe, A. M., High-dimensional dynamics of generalization error in neural networks, arXiv preprint arXiv:1710.03667, (), (2017). (cite:advani2017high)

6 July, 2019

Friedman, J., Hastie, T., & Tibshirani, R., The elements of statistical learning, In (Eds.), (pp. 51–61) (2001). : Springer series in statistics New York. (cite:friedman2001elements)
Barham, P., & Isard, M., Machine learning systems are stuck in a rut, In , Proceedings of the Workshop on Hot Topics in Operating Systems (pp. 177–183) (2019). New York, NY, USA: ACM. (cite:barham2019machine)
Hastie, T., Montanari, A., Rosset, S., & Tibshirani, R. J., Surprises in high-dimensional ridgeless least squares interpolation, arXiv preprint arXiv:1903.08560, (), (2019). (cite:hastie2019surprises)
Levitan, S. I., Mishra, T., & Bangalore, S., Automatic identification of gender from speech, In , Proceeding of Speech Prosody (pp. 84–88) (2016). : . (cite:levitan2016automatic)

1 July, 2019

Friedman, J., Hastie, T., & Tibshirani, R., The elements of statistical learning, In (Eds.), (pp. 51–61) (2001). : Springer series in statistics New York. (cite:friedman2001elements)
Graf, S., Herbig, T., Buck, M., & Schmidt, G., Features for voice activity detection: a comparative analysis, EURASIP Journal on Advances in Signal Processing, 2015(1), 91 (2015). (cite:graf2015features)
Welling, M., & Teh, Y. W., Bayesian learning via stochastic gradient langevin dynamics, In , Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 681–688) (2011). : . (cite:welling2011bayesian)
Goodman, J., A bit of progress in language modeling, arXiv preprint arXiv:cs/0108005, (), (2001). (cite:goodman2001progress)
Cotterell, R., Mielke, S. J., Eisner, J., & Roark, B., Are all languages equally hard to language-model?, In , Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) (pp. 536–541) (2018). New Orleans, Louisiana: Association for Computational Linguistics. (cite:cotterell-etal-2018-languages)

25 June, 2019

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B., Speaker verification using adapted gaussian mixture models, Digital signal processing, 10(1-3), 19–41 (2000). (cite:reynolds2000speaker)
Jasper Snoek, H. L., & Adams, R. P., Practical bayesian optimization of machine learning algorithms, arXiv preprint arXiv:1206.2944, (), (2012). (cite:snoek2012practical)
Breck, E., Zinkevich, M., Polyzotis, N., Whang, S., & Roy, S., Data validation for machine learning, In , Proceedings of SysML (pp. ) (2019). : . (cite:breck2019data)
Carbonell, J. G., Learning by analogy: formulating and generalizing plans from past experience, In (Eds.), Machine learning (pp. 137–161) (1983). : Springer. (cite:carbonell1983learning)
Liu, B., Wang, L., Liu, M., & Xu, C., Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems, , abs/1901.06455(), (2019). (cite:Liu2019LifelongFR)

15 June, 2019

Mohri, M., Pereira, F., & Riley, M., Weighted finite-state transducers in speech recognition, Computer Speech & Language, 16(1), 69–88 (2002). (cite:MOHRI200269)
Ueffing, N., Bisani, M., & Vozila, P., Improved models for automatic punctuation prediction for spoken and written text., In , Interspeech (pp. 3097–3101) (2013). : . (cite:ueffing2013improved)
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., & Yu, S. X., Large-scale long-tailed recognition in an open world, arXiv preprint arXiv:1904.05160, (), (2019). (cite:liu2019large)
Iyer, A., Jonnalagedda, M., Parthasarathy, S., Radhakrishna, A., & Rajamani, S. K., Synthesis and machine learning for heterogeneous extraction, In , Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 301–315) (2019). : . (cite:iyer2019synthesis)

8 June, 2019

Dehak, N., Kenny, P. J., Dehak, R'eda, Dumouchel, P., & Ouellet, P., Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798 (2010). (cite:dehak2010front)
Dehak, N., Dehak, R., Kenny, P., Br"ummer, Niko, Ouellet, P., & Dumouchel, P., Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, In , Tenth Annual conference of the international speech communication association (pp. ) (2009). : . (cite:dehak2009support)
Sutton, C., & McCallum, A., An introduction to conditional random fields for relational learning, In (Eds.), Introduction to Statistical Relational Learning (pp. ) (2006). : . (cite:sutton06introduction)
Mendis, C., Droppo, J., Maleki, S., Musuvathi, M., Mytkowicz, T., & Zweig, G., Parallelizing wfst speech decoders, In , 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5325–5329) (2016). : . (cite:mendis2016parallelizing)

1 June, 2019

Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., & others, , A tutorial on thompson sampling, Foundations and Trends{\textregistered} in Machine Learning, 11(1), 1–96 (2018). (cite:russo2018tutorial)

18 May, 2019

Gravano, A., Jansche, M., & Bacchiani, M., Restoring punctuation and capitalization in transcribed speech, In , 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4741–4744) (2009). : . (cite:gravano2009restoring)
Mintz, M., Bills, S., Snow, R., & Jurafsky, D., Distant supervision for relation extraction without labeled data, In , Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 (pp. 1003–1011) (2009). : . (cite:mintz2009distant)
Beygelzimer, A., Daum'e, Hal, Langford, J., & Mineiro, P., Learning reductions that really work, Proceedings of the IEEE, 104(1), 136–147 (2016). (cite:beygelzimer2016learning)

13 May, 2019

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., …, Hidden technical debt in machine learning systems, In , Advances in neural information processing systems (pp. 2503–2511) (2015). : . (cite:sculley2015hidden)
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., …, Google’s neural machine translation system: bridging the gap between human and machine translation, arXiv preprint arXiv:1609.08144, (), (2016). (cite:wu2016google)
Ghahramani, Z., Unsupervised learning, In , Summer School on Machine Learning (pp. 72–112) (2003). : . (cite:ghahramani2003unsupervised)
Hundman, K., Constantinou, V., Laporte, C., Colwell, I., & Soderstrom, T., Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding, In , Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining (pp. 387–395) (2018). : . (cite:hundman2018detecting)

Files

index.org

Latest commit

History