-
Notifications
You must be signed in to change notification settings - Fork 18
/
perf
324 lines (269 loc) · 18.2 KB
/
perf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
The following was computed on the a 5000x10 conversation; it seems that reflection is one of our problems here
2014-May-09 11:27:06 -0700 fullerine INFO [polismath.simulation] - Profiling: :user/clusters
Id Calls Min Max MAD Mean Time% Time
:user/clean-start-clusters 2 9.0ms 4.4s 2.2s 2.2s 30 4.4s
:user/safe-recenter-clusters 2 5.0ms 2.2s 1.1s 1.1s 15 2.2s
:user/row-subset 106 260.0μs 68.0ms 8.0ms 20.0ms 15 2.2s
:user/recenter-clusters 2 3.0ms 2.1s 1.1s 1.1s 14 2.1s
:user/rowname-subset 53 462.0μs 47.0ms 8.0ms 20.0ms 7 1.0s
:user/posible-clusters 2 306.0μs 23.0ms 12.0ms 12.0ms 0 24.0ms
:user/uniqify-clusters 2 20.0μs 1.0ms 737.0μs 757.0μs 0 2.0ms
Clock Time 100 14.8s
Accounted Time 81 12.0s
Here are the reflection warnings that matter to our lib
Auto-boxing loop arg: current-indent
exception.clj:44 recur arg for primitive local: i is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: i
# The main culprit?
Reflection warning, polismath/named_matrix.clj:21:41 - call to indexOf can't be resolved.
Reflection warning, polismath/named_matrix.clj:46:11 - call to indexOf can't be resolved.
Reflection warning, polismath/named_matrix.clj:55:11 - call to indexOf can't be resolved.
Reflection warning, polismath/named_matrix.clj:61:34 - call to indexOf can't be resolved.
Reflection warning, polismath/named_matrix.clj:67:24 - call to indexOf can't be resolved.
pca.clj:51 recur arg for primitive local: last_eigval is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: last-eigval
Reflection warning, polismath/clusters.clj:136:27 - call to indexOf can't be resolved.
Reflection warning, polismath/clusters.clj:136:27 - call to indexOf can't be resolved.
Performance warning, polismath/clusters.clj:201:25 - case has int tests, but tested expression is not primitive.
Reflection warning, polismath/conversation.clj:22:13 - call to indexOf can't be resolved.
Reflection warning, polismath/simulation.clj:78:3 - reference to field nextElement can't be resolved.
Fuck... this doesn't seem to be working. I took away all of the reflection warnings with a type-indexof
function, but still no dice. What's left of the reflections:
(defn typed-indexof [^java.util.List coll item]
(.indexOf coll item))
# Other libraries...
Reflection warning, /tmp/form-init1039726133741870246.clj:1:911 - call to invokeStaticMethod can't be resolved.
Reflection warning, alex_and_georges/debug_repl.clj:101:8 - reference to field getCause can't be resolved.
Reflection warning, alex_and_georges/debug_repl.clj:102:8 - reference to field getCause can't be resolved.
Reflection warning, plumbing/fnk/schema.clj:93:37 - call to java.lang.IllegalArgumentException ctor can't be resolved.
Reflection warning, clojure/tools/trace.clj:231:9 - call to setStackTrace can't be resolved.
Reflection warning, clojure/tools/trace.clj:242:11 - call to setStackTrace can't be resolved.
Reflection warning, clojure/tools/trace.clj:244:49 - call to java.lang.Exception ctor can't be resolved.
Reflection warning, clojure/tools/trace.clj:244:11 - call to setStackTrace can't be resolved.
Reflection warning, clojure/tools/trace.clj:276:9 - call to setStackTrace can't be resolved.
Reflection warning, clojure/tools/trace.clj:277:15 - call to setStackTrace can't be resolved.
Reflection warning, bigml/sampling/random.clj:11:11 - call to cern.jet.random.tdouble.engine.MersenneTwister64 vector can't be resolved.
Reflection warning, clj_time/core.clj:577:10 - reference to field getDayOfMonth can't be resolved.
# Us????
columns.clj:125 recur arg for primitive local: current_indent is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: current-indent
exception.clj:44 recur arg for primitive local: i is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: i
pca.clj:51 recur arg for primitive local: last_eigval is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: last-eigval
Performance warning, polismath/clusters.clj:201:25 - case has int tests, but tested expression is not primitive.
Reflection warning, polismath/conversation.clj:22:13 - call to indexOf can't be resolved.
Reflection warning, polismath/simulation.clj:78:3 - reference to field nextElement can't be resolved.
May 12, 2014
WOOT! It seems I've sorted out the biggest culrpit here.
It turned out to be the filter-by-index function.
It was creating a new set each time, which was absolutely killing performance.
I was able to cut down an order of magnitude by doing this alone.
So now on to other things...
For starters, I tried running on 100,000 ptpts 10 cmts.
There are the big and big-part update times
N-ptpts: 99998
Starting new conv update!
1399963793428 rating-mat 1292120.671242 msecs #=> 1,292.120s #=> 20 minutes
1399963793436 mat 7.876006 msecs
1399963800685 pca 7249.400704 msecs # 7 sec
1399963802045 proj 1359.328171 msecs # 1.3 sec
1399963848873 base-clusters 46828.255953 msecs # 46 sec
1399963848873 bid-to-pid 0.054336 msecs
1399963849626 votes-base 752.377173 msecs
1399963849626 counting-ptpts 0.003771 msecs
1399963849628 group-clusters 2.392552 msecs
1399963849628 CONVUP big 1353257.322353 msecs
N-ptpts: 99998
1399963849843 rating-mat 140.33363 msecs
1399963849849 mat 5.574799 msecs
1399963855133 pca 5283.952589 msecs
1399963856372 proj 1238.580412 msecs
1399964442254 base-clusters 585882.249774 msecs # 585.882 # 10 min
1399964442254 bid-to-pid 0.056502 msecs
1399964442916 votes-base 661.268519 msecs
1399964442916 counting-ptpts 0.003702 msecs
1399964442917 group-clusters 1.644483 msecs
1399964442918 CONVUP big-part 593284.9938 msecs
2014-May-13 00:00:42 -0700 noether INFO [polismath.simulation] - Profiling: :user/clusters
Id Calls Min Max MAD Mean Time% Time
:user/kmeans 2 2.0ms 585.9s 292.9s 292.9s 99 585.9s
:user/clean-start-clusters 2 1.0ms 546.2s 273.1s 273.1s 92 546.2s
:user/safe-recenter-clusters 2 604.0μs 276.2s 138.1s 138.1s 47 276.2s
:user/safe-recenter-map 2 552.0μs 276.2s 138.1s 138.1s 47 276.2s
:user/safe-rowname-subset 53 90.0μs 9.6s 1.7s 5.2s 46 274.5s
:user/recenter-clusters 2 485.0μs 269.9s 135.0s 135.0s 45 269.9s
:user/rowname-subset 53 82.0μs 8.7s 1.6s 5.1s 45 268.4s
:user/row-subset 106 46.0μs 47.0ms 3.0ms 28.0ms 1 3.0s
:user/filter-by-index 212 5.0μs 862.0μs 129.0μs 439.0μs 0 93.0ms
:user/uniqify-clusters 2 12.0μs 1.0ms 510.0μs 522.0μs 0 1.0ms
:user/safe-recenter-empty 2 3.0μs 301.0μs 149.0μs 152.0μs 0 303.0μs
:user/safe-recenter-nil 2 21.0μs 155.0μs 67.0μs 88.0μs 0 175.0μs
Clock Time 100 593.3s
Accounted Time 421 2500.3s
Hmm.... This is suggesting that performance on the rating-mat update may actually become a problem if we can't
keep everything else moving quickly enough.
Of course, this corresponds to 1,000,000 votes.
If we ever get 1,000,000 votes at once then (for a specific conv), God help us for now... XXX
On the other hand, note that the base clustering takes QUITE a long time in this second one here (10 minutes).
Again, this seems to be due largely due to the clean-start-clusters.
We really have to figure out how to speed that up. XXX
Another thought: initially I was concerned that something might not be quite right given that the second
(partial) updata would always take so much longer than the full matrix population.
But this actually makes sense: most of the time is in the clean-start-clusters, but the clean-start-clusters
is going to be a lot simpler if the last round didn't have that many members.
Also, note that there are still some strange "unable to slice nil" errors cropping up every now and again.
A few of those leading to a backlog, and it might put us on ice while the conversation catches up. XXX
One last thing:
Note that the PCA grows as a function of the number of comments, but that downstream analyses don't, because
they're just on projections.
# 10 cmts
big
1399967451528 pca 942.879957 msecs # 1 sec
1399967451706 proj 177.306975 msecs # 0.2 sec
big-part
1399967458208 pca 751.15802 msecs # < 1sec
1399967458369 proj 161.097721 msecs
# 100 cmts
big
1399967669033 pca 6840.381594 msecs # 7 sec
1399967669866 proj 833.318748 msecs # 0.8 sec
big-part
1399967679124 pca 3850.089523 msecs # 4 sec
1399967679911 proj 786.742437 msecs # 0.8 sec
Something interesting here: Note the second times are a lot shorter in each case (1/2-ish).
This is likely because the power-iteration is working exactly as planned.
Which is good.
Still, if it's another 7x to increase both the cmts and ptpts 10x, then we could easily expect to see minute long
compute times for our PCA step.
So we'll either want to fix our Batch PCA or try to speed this bit up XXX
Also note here that the time for update is much lower than it was when there were 10x more people (80sec).
Could this let us use a tranpose for this messy operation? XXX
In general this may also point to where some of the possible optimizations are with respect to this update
implementation.
May 18, 2014
Been a little while since I checked in...
I've figured out what is causing the slow rating mtarix update.
Silly me... I was doing indexOf on the rows list to get to the matrix indices.
Bad idea...
I'm going to switch to hash-maos, through which I think we can cut the matrix-update code to seconds with 500k ptpts
This was what we had from that count:
2014-May-17 22:27:26 -0700 noether INFO [polismath.simulation] - Profiling: :user/clusters
Id Calls Min Max MAD Mean Time% Time
:user/update-nmat 1 392.2s 392.2s 0ns 392.2s 92 392.2s
:user/MU-indices 600000 2.0μs 3.0ms 324.0μs 627.0μs 88 376.2s
:user/rowname-subset 54 47.0μs 15.7s 571.0ms 298.0ms 4 16.1s
:user/kmeans 2 2.0ms 12.1s 6.1s 6.1s 3 12.1s
:user/MU-rows-cols 600000 1.0μs 1.3s 4.0μs 4.0μs 1 2.6s
:user/clean-start-clusters 2 1.0ms 1.6s 818.0ms 819.0ms 0 1.6s
:user/MU-assoc-in 600000 1.0μs 4.0ms 278ns 2.0μs 0 1.4s
:user/MU-indices-2 600000 1.0μs 219.0μs 247ns 2.0μs 0 1.2s
:user/safe-recenter-clusters 2 576.0μs 820.0ms 410.0ms 410.0ms 0 820.0ms
:user/safe-recenter-map 2 564.0μs 819.0ms 409.0ms 410.0ms 0 820.0ms
:user/row-subset 107 29.0μs 45.0ms 988.0μs 7.0ms 0 771.0ms
:user/recenter-clusters 2 351.0μs 768.0ms 384.0ms 384.0ms 0 768.0ms
:user/MU-assoc 600000 698ns 1.0ms 226ns 1.0μs 0 626.0ms
:user/MU-row-add 600000 488ns 197.0μs 492ns 966ns 0 580.0ms
:user/MU-col-add 600000 488ns 293.0μs 101ns 911ns 0 547.0ms
:user/safe-rowname-subset 53 57.0μs 12.0ms 856.0μs 8.0ms 0 401.0ms
Clock Time 100 428.3s
Accounted Time 189 808.8s
Will want to keep an eye on the other MU items on that list, as they are liable to be the next points for speeding things up.
May 21, 2014
Woot! I seem to have stamped this shit.
At least largely.
Before I go on though, I should make a correction.
I'm pretty sure the above was actually on 600,000 new _comments_, not ptpts.
The latter would not make sense with the call counts shown above.
Anyway; it seems that using indexOf on matrices was basically just a terrible idea.
I thought lookup was constant time.
Which is silly in retrospect.
Anyway.
It's not.
I've implemented a custom data type which maintains both a vector and a hash-map so we can quickly get (the
former equivalent of) indexOf.
Now look at the times:
Starting new conv update!
N-ptpts: 60000
count 720000
1400735231267 rating-mat 3003.265079 msecs
1400735231267 counting-comments 0.0044 msecs
1400735231336 user-vote-count 68.865603 msecs
1400735231381 mat 5.540164 msecs
1400735235871 pca 4489.99813 msecs
1400735236737 proj 865.464804 msecs
1400735256213 base-clusters 19475.856953 msecs
1400735256213 bid-to-pid 0.054686 msecs
1400735256715 votes-base 501.544857 msecs
1400735256715 counting-ptpts 0.004539 msecs
1400735256717 group-clusters 1.903313 msecs
1400735256717 CONVUP big-part 31481.174735 msecs
2014-May-21 22:07:36 -0700 noether INFO [polismath.simulation] - Profiling: :user/clusters
Id Calls Min Max MAD Mean Time% Time
:user/kmeans 2 2.0ms 19.3s 9.6s 9.6s 61 19.3s
:user/update-nmat 1 2.9s 2.9s 0ns 2.9s 9 2.9s
:user/clean-start-clusters 2 928.0μs 1.8s 921.0ms 921.0ms 6 1.8s
:user/safe-recenter-clusters 2 505.0μs 944.0ms 472.0ms 472.0ms 3 945.0ms
:user/safe-recenter-map 2 492.0μs 944.0ms 472.0ms 472.0ms 3 944.0ms
:user/recenter-clusters 2 268.0μs 837.0ms 418.0ms 418.0ms 3 837.0ms
:user/safe-rowname-subset 53 37.0μs 4.0ms 345.0μs 3.0ms 1 164.0ms
Clock Time 100 31.5s
Accounted Time 85 26.9s
Damn son.
Down to 30 seconds.
Still some room for improvment given I'm looking at some new reflection/performance warnings.
We'll see how things do after that.
Hmmm... I tried running on 1mil ptpts and 100 cmts and got this.
Starting new conv update!
N-ptpts: 100000
count 10200000
1400736772092 rating-mat 454258.957733 msecs
1400736772093 counting-comments 0.005029 msecs
1400736778787 user-vote-count 6694.438309 msecs
1400736778866 mat 8.482992 msecs
1400736853808 pca 74942.795939 msecs
1400736863400 proj 9591.412637 msecs
1400736914508 base-clusters 51108.178211 msecs
1400736914509 bid-to-pid 0.065022 msecs
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:364)
at clojure.lang.PersistentHashMap$ArrayNode.assoc(PersistentHashMap.java:361)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:28)
at clojure.lang.RT.assoc(RT.java:702)
at clojure.core$assoc.invoke(core.clj:187)
at clojure.core$zipmap.invoke(core.clj:2715)
at polismath.conversation$agg_bucket_votes_for_tid.invoke(conversation.clj:24)
at polismath.conversation$fn__4283$fnk4280_positional__4284$fn__4287.invoke(conversation.clj:168)
Can probably fix this with JVM settings, but for now we seem to have found an upper limit.
However, of interest is that the rating-mat update, while significant is operating on 100mil votes, which
we'll liekly never see in a single update.
If we have a milion people probably more like 10 per user minute, tops.
So if we keep everything else moving quickly enough, we might be fine.
Here is some data for 30k x 20cmts:
Starting new conv update!
N-ptpts: 30000
count 660000
1400746217528 rating-mat 1503.576119 msecs
1400746217529 counting-comments 0.0044 msecs
1400746217583 user-vote-count 54.830368 msecs
1400746217621 mat 4.400001 msecs
1400746221937 pca 4316.419774 msecs
1400746222599 proj 661.168579 msecs
1400746234367 base-clusters 11768.231347 msecs
1400746234367 bid-to-pid 0.054615 msecs
1400746234788 votes-base 420.764263 msecs
1400746234788 counting-ptpts 0.004191 msecs
1400746234790 group-clusters 1.557323 msecs
1400746234790 CONVUP big-part 21241.605796 msecs
2014-May-22 01:10:34 -0700 noether INFO [polismath.simulation] - Profiling: :user/clusters
Id Calls Min Max MAD Mean Time% Time
:user/kmeans 2 1.0ms 11.7s 5.8s 5.8s 55 11.7s
:user/update-nmat 1 1.4s 1.4s 0ns 1.4s 6 1.4s
:user/clean-start-clusters 2 891.0μs 1.2s 588.0ms 589.0ms 6 1.2s
:user/safe-recenter-clusters 2 470.0μs 585.0ms 292.0ms 293.0ms 3 585.0ms
:user/safe-recenter-map 2 457.0μs 585.0ms 292.0ms 293.0ms 3 585.0ms
:user/recenter-clusters 2 260.0μs 520.0ms 260.0ms 260.0ms 2 521.0ms
:user/safe-rowname-subset 53 33.0μs 3.0ms 222.0μs 2.0ms 0 105.0ms
Clock Time 100 21.2s
Accounted Time 75 16.0s