-
Notifications
You must be signed in to change notification settings - Fork 271
/
Copy pathmanual-references-2020-01-29.json
26556 lines (26556 loc) · 836 KB
/
manual-references-2020-01-29.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[
{
"id": "3qm8sXnB",
"URL": "https://arxiv.org/abs/1106.5730",
"number": "1106.5730",
"title": "HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent",
"issued": {
"date-parts": [
[
2011,
11,
14
]
]
},
"author": [
{
"given": "Feng",
"family": "Niu"
},
{
"given": "Benjamin",
"family": "Recht"
},
{
"given": "Christopher",
"family": "Re"
},
{
"given": "Stephen J.",
"family": "Wright"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1106.5730"
},
{
"id": "8RAYEOPl",
"URL": "https://arxiv.org/abs/1206.7051",
"number": "1206.7051",
"title": "Stochastic Variational Inference",
"issued": {
"date-parts": [
[
2013,
4,
24
]
]
},
"author": [
{
"given": "Matt",
"family": "Hoffman"
},
{
"given": "David M.",
"family": "Blei"
},
{
"given": "Chong",
"family": "Wang"
},
{
"given": "John",
"family": "Paisley"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1206.7051"
},
{
"id": "g2vvbB91",
"URL": "https://arxiv.org/abs/1212.0901v2",
"number": "1212.0901v2",
"version": "v2",
"title": "Advances in Optimizing Recurrent Networks",
"issued": {
"date-parts": [
[
2012,
12,
4
]
]
},
"author": [
{
"literal": "Yoshua Bengio"
},
{
"literal": "Nicolas Boulanger-Lewandowski"
},
{
"literal": "Razvan Pascanu"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": "After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modelling sequences,their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.",
"note": "This CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1212.0901v2"
},
{
"id": "1GhHIDxuW",
"URL": "https://arxiv.org/abs/1301.3781",
"number": "1301.3781",
"title": "Efficient Estimation of Word Representations in Vector Space",
"issued": {
"date-parts": [
[
2013,
9,
10
]
]
},
"author": [
{
"given": "Tomas",
"family": "Mikolov"
},
{
"given": "Kai",
"family": "Chen"
},
{
"given": "Greg",
"family": "Corrado"
},
{
"given": "Jeffrey",
"family": "Dean"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1301.3781"
},
{
"id": "15y7iq6HF",
"URL": "https://arxiv.org/abs/1308.0850",
"number": "1308.0850",
"title": "Generating Sequences With Recurrent Neural Networks",
"issued": {
"date-parts": [
[
2014,
6,
6
]
]
},
"author": [
{
"given": "Alex",
"family": "Graves"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1308.0850"
},
{
"id": "voh0OiT2",
"URL": "https://arxiv.org/abs/1311.2901",
"number": "1311.2901",
"title": "Visualizing and Understanding Convolutional Networks",
"issued": {
"date-parts": [
[
2013,
12,
2
]
]
},
"author": [
{
"given": "Matthew D",
"family": "Zeiler"
},
{
"given": "Rob",
"family": "Fergus"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \\etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1311.2901"
},
{
"id": "1YcKYTvO",
"URL": "https://arxiv.org/abs/1312.6034",
"number": "1312.6034",
"title": "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps",
"issued": {
"date-parts": [
[
2014,
4,
22
]
]
},
"author": [
{
"given": "Karen",
"family": "Simonyan"
},
{
"given": "Andrea",
"family": "Vedaldi"
},
{
"given": "Andrew",
"family": "Zisserman"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013]. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1312.6034"
},
{
"id": "1AhGoHZP9",
"URL": "https://arxiv.org/abs/1312.6184",
"number": "1312.6184",
"title": "Do Deep Nets Really Need to be Deep?",
"issued": {
"date-parts": [
[
2014,
10,
14
]
]
},
"author": [
{
"given": "Lei Jimmy",
"family": "Ba"
},
{
"given": "Rich",
"family": "Caruana"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Currently, deep neural networks are the state of the art on problems such as speech recognition and computer vision. In this extended abstract, we show that shallow feed-forward networks can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models. Moreover, in some cases the shallow neural nets can learn these deep functions using a total number of parameters similar to the original deep model. We evaluate our method on the TIMIT phoneme recognition task and are able to train shallow fully-connected nets that perform similarly to complex, well-engineered, deep convolutional architectures. Our success in training shallow neural nets to mimic deeper models suggests that there probably exist better algorithms for training shallow feed-forward nets than those currently available. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1312.6184"
},
{
"id": "1Fel6Bdb8",
"URL": "https://arxiv.org/abs/1312.6199",
"number": "1312.6199",
"title": "Intriguing properties of neural networks",
"issued": {
"date-parts": [
[
2014,
2,
20
]
]
},
"author": [
{
"given": "Christian",
"family": "Szegedy"
},
{
"given": "Wojciech",
"family": "Zaremba"
},
{
"given": "Ilya",
"family": "Sutskever"
},
{
"given": "Joan",
"family": "Bruna"
},
{
"given": "Dumitru",
"family": "Erhan"
},
{
"given": "Ian",
"family": "Goodfellow"
},
{
"given": "Rob",
"family": "Fergus"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.\n First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.\n Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input. ",
"note": "license: http://creativecommons.org/licenses/by/3.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1312.6199"
},
{
"id": "8t43CQ9m",
"URL": "https://arxiv.org/abs/1403.1347",
"number": "1403.1347",
"title": "Deep Supervised and Convolutional Generative Stochastic Network for Protein Secondary Structure Prediction",
"issued": {
"date-parts": [
[
2014,
3,
7
]
]
},
"author": [
{
"given": "Jian",
"family": "Zhou"
},
{
"given": "Olga G.",
"family": "Troyanskaya"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Predicting protein secondary structure is a fundamental problem in protein structure prediction. Here we present a new supervised generative stochastic network (GSN) based method to predict local secondary structure with deep hierarchical representations. GSN is a recently proposed deep learning technique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generative model. We present the supervised extension of GSN, which learns a Markov chain to sample from a conditional distribution, and applied it to protein structure prediction. To scale the model to full-sized, high-dimensional data, like protein sequences with hundreds of amino acids, we introduce a convolutional architecture, which allows efficient learning across multiple layers of hierarchical representations. Our architecture uniquely focuses on predicting structured low-level labels informed with both low and high-level representations learned by the model. In our application this corresponds to labeling the secondary structure state of each amino-acid residue. We trained and tested the model on separate sets of non-homologous proteins sharing less than 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513 dataset, better than the previously reported best performance 64.9% (Wang et al., 2011) for this challenging secondary structure prediction problem. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1403.1347"
},
{
"id": "ZSVsnPVO",
"URL": "https://arxiv.org/abs/1404.5997",
"number": "1404.5997",
"title": "One weird trick for parallelizing convolutional neural networks",
"issued": {
"date-parts": [
[
2014,
4,
29
]
]
},
"author": [
{
"given": "Alex",
"family": "Krizhevsky"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales significantly better than all alternatives when applied to modern convolutional neural networks. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1404.5997"
},
{
"id": "1Dzz0P0qr",
"URL": "https://arxiv.org/abs/1406.1231",
"number": "1406.1231",
"title": "Multi-task Neural Networks for QSAR Predictions",
"issued": {
"date-parts": [
[
2014,
6,
6
]
]
},
"author": [
{
"given": "George E.",
"family": "Dahl"
},
{
"given": "Navdeep",
"family": "Jaitly"
},
{
"given": "Ruslan",
"family": "Salakhutdinov"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Although artificial neural networks have occasionally been used for Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) studies in the past, the literature has of late been dominated by other machine learning techniques such as random forests. However, a variety of new neural net techniques along with successful applications in other domains have renewed interest in network approaches. In this work, inspired by the winning team's use of neural networks in a recent QSAR competition, we used an artificial neural network to learn a function that predicts activities of compounds for multiple assays at the same time. We conducted experiments leveraging recent methods for dealing with overfitting in neural networks as well as other tricks from the neural networks literature. We compared our methods to alternative methods reported to perform well on these tasks and found that our neural net methods provided superior performance. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1406.1231"
},
{
"id": "haHzVaaz",
"URL": "https://arxiv.org/abs/1409.0473",
"number": "1409.0473",
"title": "Neural Machine Translation by Jointly Learning to Align and Translate",
"issued": {
"date-parts": [
[
2016,
5,
23
]
]
},
"author": [
{
"given": "Dzmitry",
"family": "Bahdanau"
},
{
"given": "Kyunghyun",
"family": "Cho"
},
{
"given": "Yoshua",
"family": "Bengio"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1409.0473"
},
{
"id": "2cMhMv5A",
"URL": "https://arxiv.org/abs/1409.3215",
"number": "1409.3215",
"title": "Sequence to Sequence Learning with Neural Networks",
"issued": {
"date-parts": [
[
2014,
12,
16
]
]
},
"author": [
{
"given": "Ilya",
"family": "Sutskever"
},
{
"given": "Oriol",
"family": "Vinyals"
},
{
"given": "Quoc V.",
"family": "Le"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1409.3215"
},
{
"id": "YwdqeYZi",
"URL": "https://arxiv.org/abs/1410.0759",
"number": "1410.0759",
"title": "cuDNN: Efficient Primitives for Deep Learning",
"issued": {
"date-parts": [
[
2014,
12,
19
]
]
},
"author": [
{
"given": "Sharan",
"family": "Chetlur"
},
{
"given": "Cliff",
"family": "Woolley"
},
{
"given": "Philippe",
"family": "Vandermersch"
},
{
"given": "Jonathan",
"family": "Cohen"
},
{
"given": "John",
"family": "Tran"
},
{
"given": "Bryan",
"family": "Catanzaro"
},
{
"given": "Evan",
"family": "Shelhamer"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the Basic Linear Algebra Subroutines (BLAS). However, there is no analogous library for deep learning. Without such a library, researchers implementing deep learning workloads on parallel processors must create and optimize their own implementations of the main computational kernels, and this work must be repeated as new parallel processors emerge. To address this problem, we have created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Our implementation contains routines for GPUs, although similarly to the BLAS library, these routines could be implemented for other platforms. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage. For example, integrating cuDNN into Caffe, a popular framework for convolutional networks, improves performance by 36% on a standard model while also reducing memory consumption. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1410.0759"
},
{
"id": "pxdeuhMS",
"URL": "https://arxiv.org/abs/1411.2581v1",
"number": "1411.2581v1",
"version": "v1",
"title": "Deep Exponential Families",
"issued": {
"date-parts": [
[
2014,
11,
10
]
]
},
"author": [
{
"literal": "Rajesh Ranganath"
},
{
"literal": "Linpeng Tang"
},
{
"literal": "Laurent Charlin"
},
{
"literal": "David M. Blei"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": "We describe \\textit{deep exponential families} (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent \"black box\" variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show that going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.",
"note": "This CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1411.2581v1"
},
{
"id": "19mGl6pfy",
"URL": "https://arxiv.org/abs/1412.0035",
"number": "1412.0035",
"title": "Understanding Deep Image Representations by Inverting Them",
"issued": {
"date-parts": [
[
2014,
12,
2
]
]
},
"author": [
{
"given": "Aravindh",
"family": "Mahendran"
},
{
"given": "Andrea",
"family": "Vedaldi"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Image representations, from SIFT and Bag of Visual Words to Convolutional Neural Networks (CNNs), are a crucial component of almost any image understanding system. Nevertheless, our understanding of them remains limited. In this paper we conduct a direct analysis of the visual information contained in representations by asking the following question: given an encoding of an image, to which extent is it possible to reconstruct the image itself? To answer this question we contribute a general framework to invert representations. We show that this method can invert representations such as HOG and SIFT more accurately than recent alternatives while being applicable to CNNs too. We then use this technique to study the inverse of recent state-of-the-art CNN image representations for the first time. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1412.0035"
},
{
"id": "1AkF8Wsv7",
"URL": "https://arxiv.org/abs/1412.1897v4",
"number": "1412.1897v4",
"version": "v4",
"title": "Deep Neural Networks are Easily Fooled: High Confidence Predictions for\n Unrecognizable Images",
"issued": {
"date-parts": [
[
2014,
12,
5
]
]
},
"author": [
{
"literal": "Anh Nguyen"
},
{
"literal": "Jason Yosinski"
},
{
"literal": "Jeff Clune"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": "Deep neural networks (DNNs) have recently been achieving state-of-the-art performance on a variety of pattern-recognition tasks, most notably visual classification problems. Given that DNNs are now able to classify objects in images with near-human-level performance, questions naturally arise as to what differences remain between computer and human vision. A recent study revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library). Here we show a related result: it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion). Specifically, we take convolutional neural networks trained to perform well on either the ImageNet or MNIST datasets and then find images with evolutionary algorithms or gradient ascent that DNNs label with high confidence as belonging to each dataset class. It is possible to produce images totally unrecognizable to human eyes that DNNs believe with near certainty are familiar objects, which we call \"fooling images\" (more generally, fooling examples). Our results shed light on interesting differences between human vision and current DNNs, and raise questions about the generality of DNN computer vision.",
"note": "This CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1412.1897v4"
},
{
"id": "UtcyntjF",
"URL": "https://arxiv.org/abs/1412.6572",
"number": "1412.6572",
"title": "Explaining and Harnessing Adversarial Examples",
"issued": {
"date-parts": [
[
2015,
3,
24
]
]
},
"author": [
{
"given": "Ian J.",
"family": "Goodfellow"
},
{
"given": "Jonathon",
"family": "Shlens"
},
{
"given": "Christian",
"family": "Szegedy"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1412.6572"
},
{
"id": "f2L6isRj",
"URL": "https://arxiv.org/abs/1412.6806",
"number": "1412.6806",
"title": "Striving for Simplicity: The All Convolutional Net",
"issued": {
"date-parts": [
[
2015,
4,
14
]
]
},
"author": [
{
"given": "Jost Tobias",
"family": "Springenberg"
},
{
"given": "Alexey",
"family": "Dosovitskiy"
},
{
"given": "Thomas",
"family": "Brox"
},
{
"given": "Martin",
"family": "Riedmiller"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. We re-evaluate the state of the art for object recognition from small images with convolutional networks, questioning the necessity of different components in the pipeline. We find that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Following this finding -- and building on other recent work for finding simple network structures -- we propose a new architecture that consists solely of convolutional layers and yields competitive or state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100, ImageNet). To analyze the network we introduce a new variant of the \"deconvolution approach\" for visualizing features learned by CNNs, which can be applied to a broader range of network structures than existing approaches. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1412.6806"
},
{
"id": "1G3owNNps",
"URL": "https://arxiv.org/abs/1412.7024",
"number": "1412.7024",
"title": "Training deep neural networks with low precision multiplications",
"issued": {
"date-parts": [
[
2015,
9,
24
]
]
},
"author": [
{
"given": "Matthieu",
"family": "Courbariaux"
},
{
"given": "Yoshua",
"family": "Bengio"
},
{
"given": "Jean-Pierre",
"family": "David"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Multipliers are the most space and power-hungry arithmetic operators of the digital implementation of deep neural networks. We train a set of state-of-the-art neural networks (Maxout networks) on three benchmark datasets: MNIST, CIFAR-10 and SVHN. They are trained with three distinct formats: floating point, fixed point and dynamic fixed point. For each of those datasets and for each of those formats, we assess the impact of the precision of the multiplications on the final error after training. We find that very low precision is sufficient not just for running trained networks but also for training them. For example, it is possible to train Maxout networks with 10 bits multiplications. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1412.7024"
},
{
"id": "yAoN5gTU",
"URL": "https://arxiv.org/abs/1502.02072",
"number": "1502.02072",
"title": "Massively Multitask Networks for Drug Discovery",
"issued": {
"date-parts": [
[
2015,
2,
10
]
]
},
"author": [
{
"given": "Bharath",
"family": "Ramsundar"
},
{
"given": "Steven",
"family": "Kearnes"
},
{
"given": "Patrick",
"family": "Riley"
},
{
"given": "Dale",
"family": "Webster"
},
{
"given": "David",
"family": "Konerding"
},
{
"given": "Vijay",
"family": "Pande"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework by performing a series of empirical studies and obtain some interesting results: (1) massively multitask networks obtain predictive accuracies significantly better than single-task methods, (2) the predictive power of multitask networks improves as additional tasks and data are added, (3) the total amount of data and the total number of tasks both contribute significantly to multitask improvement, and (4) multitask networks afford limited transferability to tasks not in the training set. Our results underscore the need for greater data sharing and further algorithmic innovation to accelerate the drug discovery process. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1502.02072"
},
{
"id": "CKcJuj03",
"URL": "https://arxiv.org/abs/1502.02551",
"number": "1502.02551",
"title": "Deep Learning with Limited Numerical Precision",
"issued": {
"date-parts": [
[
2015,
2,
11
]
]
},
"author": [
{
"given": "Suyog",
"family": "Gupta"
},
{
"given": "Ankur",
"family": "Agrawal"
},
{
"given": "Kailash",
"family": "Gopalakrishnan"
},
{
"given": "Pritish",
"family": "Narayanan"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1502.02551"
},
{
"id": "iWxvn0xF",
"URL": "https://arxiv.org/abs/1502.02791",
"number": "1502.02791",
"title": "Learning Transferable Features with Deep Adaptation Networks",
"issued": {
"date-parts": [
[
2015,
5,
28
]
]
},
"author": [
{
"given": "Mingsheng",
"family": "Long"
},
{
"given": "Yue",
"family": "Cao"
},
{
"given": "Jianmin",
"family": "Wang"
},
{
"given": "Michael I.",
"family": "Jordan"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multi-kernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1502.02791"
},
{
"id": "yHn4SDRI",
"URL": "https://arxiv.org/abs/1502.03044",
"number": "1502.03044",
"title": "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention",
"issued": {
"date-parts": [
[
2016,
4,
20
]
]
},
"author": [
{
"given": "Kelvin",
"family": "Xu"
},
{
"given": "Jimmy",
"family": "Ba"
},
{
"given": "Ryan",
"family": "Kiros"
},
{
"given": "Kyunghyun",
"family": "Cho"
},
{
"given": "Aaron",
"family": "Courville"
},
{
"given": "Ruslan",
"family": "Salakhutdinov"
},
{
"given": "Richard",
"family": "Zemel"
},
{
"given": "Yoshua",
"family": "Bengio"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1502.03044"
},
{
"id": "1CRF3gAV",
"URL": "https://arxiv.org/abs/1503.02531",
"number": "1503.02531",
"title": "Distilling the Knowledge in a Neural Network",
"issued": {
"date-parts": [
[
2015,
3,
10
]
]
},
"author": [
{
"given": "Geoffrey",
"family": "Hinton"
},
{
"given": "Oriol",
"family": "Vinyals"
},
{
"given": "Jeff",
"family": "Dean"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1503.02531"
},
{
"id": "13KjSCKB2",
"URL": "https://arxiv.org/abs/1504.04343",
"number": "1504.04343",
"title": "Caffe con Troll: Shallow Ideas to Speed Up Deep Learning",
"issued": {
"date-parts": [
[
2015,
5,
28
]
]
},
"author": [
{
"given": "Stefan",
"family": "Hadjis"
},
{
"given": "Firas",
"family": "Abuzaid"
},
{
"given": "Ce",
"family": "Zhang"
},
{
"given": "Christopher",
"family": "Ré"
}
],
"container-title": "arXiv",
"publisher": "arXiv",
"type": "report",
"abstract": " We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs. ",
"note": "license: http://arxiv.org/licenses/nonexclusive-distrib/1.0/\nThis CSL JSON Item was automatically generated by Manubot v0.3.1 using citation-by-identifier.\nstandard_id: arxiv:1504.04343"
},
{
"id": "15lYGmZpY",
"URL": "https://arxiv.org/abs/1504.04788",
"number": "1504.04788",
"title": "Compressing Neural Networks with the Hashing Trick",
"issued": {
"date-parts": [
[
2015,
4,
21
]
]
},
"author": [
{
"given": "Wenlin",
"family": "Chen"
},
{
"given": "James T.",
"family": "Wilson"
},
{