-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathatom.xml
2527 lines (2282 loc) · 931 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>CloudNative 架构</title>
<subtitle>CloudNative|云原生应用架构|云原生架构|容器化架构|微服务架构|平台架构|基础架构</subtitle>
<link href="/atom.xml" rel="self"/>
<link href="http://team.jiunile.com/"/>
<updated>2021-06-01T02:12:17.000Z</updated>
<id>http://team.jiunile.com/</id>
<author>
<name>icyboy</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>Go 代码安全指南</title>
<link href="http://team.jiunile.com//blog/2021/06/go-security.html"/>
<id>http://team.jiunile.com//blog/2021/06/go-security.html</id>
<published>2021-06-01T14:00:00.000Z</published>
<updated>2021-06-01T02:12:17.000Z</updated>
<content type="html"><h1 id="通用类"><a href="#通用类" class="headerlink" title="通用类"></a>通用类</h1><h2 id="1-代码实现类"><a href="#1-代码实现类" class="headerlink" title="1. 代码实现类"></a>1. 代码实现类</h2><h3 id="1-1-内存管理"><a href="#1-1-内存管理" class="headerlink" title="1.1 内存管理"></a>1.1 内存管理</h3><h4 id="1-1-1【必须】切片长度校验"><a href="#1-1-1【必须】切片长度校验" class="headerlink" title="1.1.1【必须】切片长度校验"></a>1.1.1【必须】切片长度校验</h4><ul>
<li>在对slice进行操作时,必须判断长度是否合法,防止程序panic</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 未判断data的长度,可导致 index out of range</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">decode</span><span class="params">(data []<span class="keyword">byte</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> data[<span class="number">0</span>] == <span class="string">'F'</span> &amp;&amp; data[<span class="number">1</span>] == <span class="string">'U'</span> &amp;&amp; data[<span class="number">2</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">3</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">4</span>] == <span class="string">'E'</span> &amp;&amp; data[<span class="number">5</span>] == <span class="string">'R'</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Bad"</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// bad: slice bounds out of range</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> slice = []<span class="keyword">int</span>&#123;<span class="number">0</span>, <span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>, <span class="number">6</span>&#125;</span><br><span class="line"> fmt.Println(slice[:<span class="number">10</span>])</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good: 使用data前应判断长度是否合法</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">decode</span><span class="params">(data []<span class="keyword">byte</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(data) == <span class="number">6</span> &#123;</span><br><span class="line"> <span class="keyword">if</span> data[<span class="number">0</span>] == <span class="string">'F'</span> &amp;&amp; data[<span class="number">1</span>] == <span class="string">'U'</span> &amp;&amp; data[<span class="number">2</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">3</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">4</span>] == <span class="string">'E'</span> &amp;&amp; data[<span class="number">5</span>] == <span class="string">'R'</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Good"</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line"> &#125;</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<a id="more"></a>
<h4 id="1-1-2【必须】nil指针判断"><a href="#1-1-2【必须】nil指针判断" class="headerlink" title="1.1.2【必须】nil指针判断"></a>1.1.2【必须】nil指针判断</h4><ul>
<li>进行指针操作时,必须判断该指针是否为nil,防止程序panic,尤其在进行结构体Unmarshal时</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Packet <span class="keyword">struct</span> &#123;</span><br><span class="line"> PackeyType <span class="keyword">uint8</span></span><br><span class="line"> PackeyVersion <span class="keyword">uint8</span></span><br><span class="line"> Data *Data</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Data <span class="keyword">struct</span> &#123;</span><br><span class="line"> Stat <span class="keyword">uint8</span></span><br><span class="line"> Len <span class="keyword">uint8</span></span><br><span class="line"> Buf [<span class="number">8</span>]<span class="keyword">byte</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *Packet)</span> <span class="title">UnmarshalBinary</span><span class="params">(b []<span class="keyword">byte</span>)</span> <span class="title">error</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(b) &lt; <span class="number">2</span> &#123;</span><br><span class="line"> <span class="keyword">return</span> io.EOF</span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> p.PackeyType = b[<span class="number">0</span>]</span><br><span class="line"> p.PackeyVersion = b[<span class="number">1</span>]</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 若长度等于2,那么不会new Data</span></span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(b) &gt; <span class="number">2</span> &#123;</span><br><span class="line"> p.Data = <span class="built_in">new</span>(Data)</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// bad: 未判断指针是否为nil</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> packet := <span class="built_in">new</span>(Packet)</span><br><span class="line"> data := <span class="built_in">make</span>([]<span class="keyword">byte</span>, <span class="number">2</span>)</span><br><span class="line"> <span class="keyword">if</span> err := packet.UnmarshalBinary(data); err != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Failed to unmarshal packet"</span>)</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> fmt.Printf(<span class="string">"Stat: %v\n"</span>, packet.Data.Stat)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good: 判断Data指针是否为nil</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> packet := <span class="built_in">new</span>(Packet)</span><br><span class="line"> data := <span class="built_in">make</span>([]<span class="keyword">byte</span>, <span class="number">2</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> err := packet.UnmarshalBinary(data); err != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Failed to unmarshal packet"</span>)</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> packet.Data == <span class="literal">nil</span> &#123;</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> fmt.Printf(<span class="string">"Stat: %v\n"</span>, packet.Data.Stat)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-3【必须】整数安全"><a href="#1-1-3【必须】整数安全" class="headerlink" title="1.1.3【必须】整数安全"></a>1.1.3【必须】整数安全</h4><ul>
<li><p>在进行数字运算操作时,需要做好长度限制,防止外部输入运算导致异常:</p>
<ul>
<li>确保无符号整数运算时不会反转</li>
<li>确保有符号整数运算时不会出现溢出</li>
<li>确保整型转换时不会出现截断错误</li>
<li>确保整型转换时不会出现符号错误</li>
</ul>
</li>
<li><p>以下场景必须严格进行长度限制:</p>
<ul>
<li>作为数组索引</li>
<li>作为对象的长度或者大小</li>
<li>作为数组的边界(如作为循环计数器)</li>
</ul>
</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 未限制长度,导致整数溢出</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">overflow</span><span class="params">(numControlByUser <span class="keyword">int32</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> numInt <span class="keyword">int32</span> = <span class="number">0</span></span><br><span class="line"> numInt = numControlByUser + <span class="number">1</span></span><br><span class="line"> <span class="comment">// 对长度限制不当,导致整数溢出</span></span><br><span class="line"> fmt.Printf(<span class="string">"%d\n"</span>, numInt)</span><br><span class="line"> <span class="comment">// 使用numInt,可能导致其他错误</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> overflow(<span class="number">2147483647</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">overflow</span><span class="params">(numControlByUser <span class="keyword">int32</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> numInt <span class="keyword">int32</span> = <span class="number">0</span></span><br><span class="line"> numInt = numControlByUser + <span class="number">1</span></span><br><span class="line"> <span class="keyword">if</span> numInt &lt; <span class="number">0</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"integer overflow"</span>)</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"> fmt.Println(<span class="string">"integer ok"</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> overflow(<span class="number">2147483647</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-4【必须】make分配长度验证"><a href="#1-1-4【必须】make分配长度验证" class="headerlink" title="1.1.4【必须】make分配长度验证"></a>1.1.4【必须】make分配长度验证</h4><ul>
<li>在进行make分配内存时,需要对外部可控的长度进行校验,防止程序panic。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">parse</span><span class="params">(lenControlByUser <span class="keyword">int</span>, data []<span class="keyword">byte</span>)</span></span> &#123;</span><br><span class="line"> size := lenControlByUser</span><br><span class="line"> <span class="comment">// 对外部传入的size,进行长度判断以免导致panic</span></span><br><span class="line"> buffer := <span class="built_in">make</span>([]<span class="keyword">byte</span>, size)</span><br><span class="line"> <span class="built_in">copy</span>(buffer, data)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">parse</span><span class="params">(lenControlByUser <span class="keyword">int</span>, data []<span class="keyword">byte</span>)</span> <span class="params">([]<span class="keyword">byte</span>, error)</span></span> &#123;</span><br><span class="line"> size := lenControlByUser</span><br><span class="line"> <span class="comment">// 限制外部可控的长度大小范围</span></span><br><span class="line"> <span class="keyword">if</span> size &gt; <span class="number">64</span>*<span class="number">1024</span>*<span class="number">1024</span> &#123;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span>, errors.New(<span class="string">"value too large"</span>)</span><br><span class="line"> &#125;</span><br><span class="line"> buffer := <span class="built_in">make</span>([]<span class="keyword">byte</span>, size)</span><br><span class="line"> <span class="built_in">copy</span>(buffer, data)</span><br><span class="line"> <span class="keyword">return</span> buffer, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-5【必须】禁止SetFinalizer和指针循环引用同时使用"><a href="#1-1-5【必须】禁止SetFinalizer和指针循环引用同时使用" class="headerlink" title="1.1.5【必须】禁止SetFinalizer和指针循环引用同时使用"></a>1.1.5【必须】禁止SetFinalizer和指针循环引用同时使用</h4><ul>
<li>当一个对象从被GC选中到移除内存之前,runtime.SetFinalizer()都不会执行,即使程序正常结束或者发生错误。由指针构成的“循环引用”虽然能被GC正确处理,但由于无法确定Finalizer依赖顺序,从而无法调用runtime.SetFinalizer(),导致目标对象无法变成可达状态,从而造成内存无法被回收。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> a, b Data</span><br><span class="line"> a.o = &amp;b</span><br><span class="line"> b.o = &amp;a</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 指针循环引用,SetFinalizer()无法正常调用</span></span><br><span class="line"> runtime.SetFinalizer(&amp;a, <span class="function"><span class="keyword">func</span><span class="params">(d *Data)</span></span> &#123;</span><br><span class="line"> fmt.Printf(<span class="string">"a %p final.\n"</span>, d)</span><br><span class="line"> &#125;)</span><br><span class="line"> runtime.SetFinalizer(&amp;b, <span class="function"><span class="keyword">func</span><span class="params">(d *Data)</span></span> &#123;</span><br><span class="line"> fmt.Printf(<span class="string">"b %p final.\n"</span>, d)</span><br><span class="line"> &#125;)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> foo()</span><br><span class="line"> time.Sleep(time.Millisecond)</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-6【必须】禁止重复释放channel"><a href="#1-1-6【必须】禁止重复释放channel" class="headerlink" title="1.1.6【必须】禁止重复释放channel"></a>1.1.6【必须】禁止重复释放channel</h4><ul>
<li>重复释放一般存在于异常流程判断中,如果恶意攻击者构造出异常条件使程序重复释放channel,则会触发运行时恐慌,从而造成DoS攻击。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">(c <span class="keyword">chan</span> <span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">defer</span> <span class="built_in">close</span>(c)</span><br><span class="line"> err := processBusiness()</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"> c &lt;- <span class="number">0</span></span><br><span class="line"> <span class="built_in">close</span>(c) <span class="comment">// 重复释放channel</span></span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"> c &lt;- <span class="number">1</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">(c <span class="keyword">chan</span> <span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">defer</span> <span class="built_in">close</span>(c) <span class="comment">// 使用defer延迟关闭channel</span></span><br><span class="line"> err := processBusiness()</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"> c &lt;- <span class="number">0</span></span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> &#125;</span><br><span class="line"> c &lt;- <span class="number">1</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-7【必须】确保每个协程都能退出"><a href="#1-1-7【必须】确保每个协程都能退出" class="headerlink" title="1.1.7【必须】确保每个协程都能退出"></a>1.1.7【必须】确保每个协程都能退出</h4><ul>
<li>启动一个协程就会做一个入栈操作,在系统不退出的情况下,协程也没有设置退出条件,则相当于协程失去了控制,它占用的资源无法回收,可能会导致内存泄露。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 协程没有设置退出条件</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">doWaiter</span><span class="params">(name <span class="keyword">string</span>, second <span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> time.Sleep(time.Duration(second) * time.Second)</span><br><span class="line"> fmt.Println(name, <span class="string">" is ready!"</span>)</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-1-8【推荐】不使用unsafe包"><a href="#1-1-8【推荐】不使用unsafe包" class="headerlink" title="1.1.8【推荐】不使用unsafe包"></a>1.1.8【推荐】不使用unsafe包</h4><ul>
<li>由于unsafe包绕过了 Golang 的内存安全原则,一般来说使用该库是不安全的,可导致内存破坏,尽量避免使用该包。若必须要使用unsafe操作指针,必须做好安全校验。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 通过unsafe操作原始指针</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">unsafePointer</span><span class="params">()</span></span> &#123;</span><br><span class="line"> b := <span class="built_in">make</span>([]<span class="keyword">byte</span>, <span class="number">1</span>)</span><br><span class="line"> foo := (*<span class="keyword">int</span>)(unsafe.Pointer(<span class="keyword">uintptr</span>(unsafe.Pointer(&amp;b[<span class="number">0</span>])) + <span class="keyword">uintptr</span>(<span class="number">0xfffffff</span>e)))</span><br><span class="line"> fmt.Print(*foo + <span class="number">1</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// [signal SIGSEGV: segmentation violation code=0x1 addr=0xc100068f55 pc=0x49142b]</span></span><br></pre></td></tr></table></figure>
<h4 id="1-1-9【推荐】不使用slice作为函数入参"><a href="#1-1-9【推荐】不使用slice作为函数入参" class="headerlink" title="1.1.9【推荐】不使用slice作为函数入参"></a>1.1.9【推荐】不使用slice作为函数入参</h4><ul>
<li>slice是引用类型,在作为函数入参时采用的是地址传递,对slice的修改也会影响原始数据</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: slice作为函数入参时是地址传递</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">modify</span><span class="params">(array []<span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> array[<span class="number">0</span>] = <span class="number">10</span> <span class="comment">// 对入参slice的元素修改会影响原始数据</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> array := []<span class="keyword">int</span>&#123;<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>&#125;</span><br><span class="line"></span><br><span class="line"> modify(array)</span><br><span class="line"> fmt.Println(array) <span class="comment">// output:[10 2 3 4 5]</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good: 函数使用数组作为入参,而不是slice</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">modify</span><span class="params">(array [5]<span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> array[<span class="number">0</span>] = <span class="number">10</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="comment">// 传入数组,注意数组与slice的区别</span></span><br><span class="line"> array := [<span class="number">5</span>]<span class="keyword">int</span>&#123;<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>&#125;</span><br><span class="line"></span><br><span class="line"> modify(array)</span><br><span class="line"> fmt.Println(array)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-2-文件操作"><a href="#1-2-文件操作" class="headerlink" title="1.2 文件操作"></a>1.2 文件操作</h3><h4 id="1-2-1【必须】-路径穿越检查"><a href="#1-2-1【必须】-路径穿越检查" class="headerlink" title="1.2.1【必须】 路径穿越检查"></a>1.2.1【必须】 路径穿越检查</h4><ul>
<li>在进行文件操作时,如果对外部传入的文件名未做限制,可能导致任意文件读取或者任意文件写入,严重可能导致代码执行。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 任意文件读取</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handler</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> path := r.URL.Query()[<span class="string">"path"</span>][<span class="number">0</span>]</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 未过滤文件路径,可能导致任意文件读取</span></span><br><span class="line"> data, _ := ioutil.ReadFile(path)</span><br><span class="line"> w.Write(data)</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 对外部传入的文件名变量,还需要验证是否存在../等路径穿越的文件名</span></span><br><span class="line"> data, _ = ioutil.ReadFile(filepath.Join(<span class="string">"/home/user/"</span>, path))</span><br><span class="line"> w.Write(data)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// bad: 任意文件写入</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">unzip</span><span class="params">(f <span class="keyword">string</span>)</span></span> &#123;</span><br><span class="line"> r, _ := zip.OpenReader(f)</span><br><span class="line"> <span class="keyword">for</span> _, f := <span class="keyword">range</span> r.File &#123;</span><br><span class="line"> p, _ := filepath.Abs(f.Name)</span><br><span class="line"> <span class="comment">// 未验证压缩文件名,可能导致../等路径穿越,任意文件路径写入</span></span><br><span class="line"> ioutil.WriteFile(p, []<span class="keyword">byte</span>(<span class="string">"present"</span>), <span class="number">0640</span>)</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good: 检查压缩的文件名是否包含..路径穿越特征字符,防止任意写入</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">unzipGood</span><span class="params">(f <span class="keyword">string</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> r, err := zip.OpenReader(f)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"read zip file fail"</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">for</span> _, f := <span class="keyword">range</span> r.File &#123;</span><br><span class="line"> <span class="keyword">if</span> !strings.Contains(f.Name, <span class="string">".."</span>) &#123;</span><br><span class="line"> p, _ := filepath.Abs(f.Name)</span><br><span class="line"> ioutil.WriteFile(p, []<span class="keyword">byte</span>(<span class="string">"present"</span>), <span class="number">0640</span>)</span><br><span class="line"> &#125; <span class="keyword">else</span> &#123;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line"> &#125;</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-2-2【必须】-文件访问权限"><a href="#1-2-2【必须】-文件访问权限" class="headerlink" title="1.2.2【必须】 文件访问权限"></a>1.2.2【必须】 文件访问权限</h4><ul>
<li>根据创建文件的敏感性设置不同级别的访问权限,以防止敏感数据被任意权限用户读取。例如,设置文件权限为:<code>-rw-r-----</code></li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ioutil.WriteFile(p, []<span class="keyword">byte</span>(<span class="string">"present"</span>), <span class="number">0640</span>)</span><br></pre></td></tr></table></figure>
<h3 id="1-3-系统接口"><a href="#1-3-系统接口" class="headerlink" title="1.3 系统接口"></a>1.3 系统接口</h3><p><strong>1.3.1【必须】命令执行检查</strong></p>
<ul>
<li>使用<code>exec.Command</code>、<code>exec.CommandContext</code>、<code>syscall.StartProcess</code>、<code>os.StartProcess</code>等函数时,第一个参数(path)直接取外部输入值时,应使用白名单限定可执行的命令范围,不允许传入<code>bash</code>、<code>cmd</code>、<code>sh</code>等命令;</li>
<li>使用<code>exec.Command</code>、<code>exec.CommandContext</code>等函数时,通过<code>bash</code>、<code>cmd</code>、<code>sh</code>等创建shell,-c后的参数(arg)拼接外部输入,应过滤\n $ &amp; ; | ‘ “ ( ) `等潜在恶意字符;</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">()</span></span> &#123;</span><br><span class="line"> userInputedVal := <span class="string">"&amp;&amp; echo 'hello'"</span> <span class="comment">// 假设外部传入该变量值</span></span><br><span class="line"> cmdName := <span class="string">"ping "</span> + userInputedVal</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 未判断外部输入是否存在命令注入字符,结合sh可造成命令注入</span></span><br><span class="line"> cmd := exec.Command(<span class="string">"sh"</span>, <span class="string">"-c"</span>, cmdName)</span><br><span class="line"> output, _ := cmd.CombinedOutput()</span><br><span class="line"> fmt.Println(<span class="keyword">string</span>(output))</span><br><span class="line"></span><br><span class="line"> cmdName := <span class="string">"ls"</span></span><br><span class="line"> <span class="comment">// 未判断外部输入是否是预期命令</span></span><br><span class="line"> cmd := exec.Command(cmdName)</span><br><span class="line"> output, _ := cmd.CombinedOutput()</span><br><span class="line"> fmt.Println(<span class="keyword">string</span>(output))</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">checkIllegal</span><span class="params">(cmdName <span class="keyword">string</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> strings.Contains(cmdName, <span class="string">"&amp;"</span>) || strings.Contains(cmdName, <span class="string">"|"</span>) || strings.Contains(cmdName, <span class="string">";"</span>) ||</span><br><span class="line"> strings.Contains(cmdName, <span class="string">"$"</span>) || strings.Contains(cmdName, <span class="string">"'"</span>) || strings.Contains(cmdName, <span class="string">"`"</span>) ||</span><br><span class="line"> strings.Contains(cmdName, <span class="string">"("</span>) || strings.Contains(cmdName, <span class="string">")"</span>) || strings.Contains(cmdName, <span class="string">"\""</span>) &#123;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> userInputedVal := <span class="string">"&amp;&amp; echo 'hello'"</span></span><br><span class="line"> cmdName := <span class="string">"ping "</span> + userInputedVal</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> checkIllegal(cmdName) &#123; <span class="comment">// 检查传给sh的命令是否有特殊字符</span></span><br><span class="line"> <span class="keyword">return</span> <span class="comment">// 存在特殊字符直接return</span></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> cmd := exec.Command(<span class="string">"sh"</span>, <span class="string">"-c"</span>, cmdName)</span><br><span class="line"> output, _ := cmd.CombinedOutput()</span><br><span class="line"> fmt.Println(<span class="keyword">string</span>(output))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-4-通信安全"><a href="#1-4-通信安全" class="headerlink" title="1.4 通信安全"></a>1.4 通信安全</h3><h4 id="1-4-1【必须】网络通信采用TLS方式"><a href="#1-4-1【必须】网络通信采用TLS方式" class="headerlink" title="1.4.1【必须】网络通信采用TLS方式"></a>1.4.1【必须】网络通信采用TLS方式</h4><ul>
<li>明文传输的通信协议目前已被验证存在较大安全风险,被中间人劫持后可能导致许多安全风险,因此必须采用至少TLS的安全通信方式保证通信安全,例如gRPC/Websocket都使用TLS1.3。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> http.HandleFunc(<span class="string">"/"</span>, <span class="function"><span class="keyword">func</span><span class="params">(w http.ResponseWriter, req *http.Request)</span></span> &#123;</span><br><span class="line"> w.Header().Add(<span class="string">"Strict-Transport-Security"</span>, <span class="string">"max-age=63072000; includeSubDomains"</span>)</span><br><span class="line"> w.Write([]<span class="keyword">byte</span>(<span class="string">"This is an example server.\n"</span>))</span><br><span class="line"> &#125;)</span><br><span class="line"></span><br><span class="line"> <span class="comment">// 服务器配置证书与私钥</span></span><br><span class="line"> log.Fatal(http.ListenAndServeTLS(<span class="string">":443"</span>, <span class="string">"yourCert.pem"</span>, <span class="string">"yourKey.pem"</span>, <span class="literal">nil</span>))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-4-2【推荐】TLS启用证书验证"><a href="#1-4-2【推荐】TLS启用证书验证" class="headerlink" title="1.4.2【推荐】TLS启用证书验证"></a>1.4.2【推荐】TLS启用证书验证</h4><ul>
<li>TLS证书应当是有效的、未过期的,且配置正确的域名,生产环境的服务端应启用证书验证。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"crypto/tls"</span></span><br><span class="line"> <span class="string">"net/http"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">doAuthReq</span><span class="params">(authReq *http.Request)</span> *<span class="title">http</span>.<span class="title">Response</span></span> &#123;</span><br><span class="line"> tr := &amp;http.Transport&#123;</span><br><span class="line"> TLSClientConfig: &amp;tls.Config&#123;InsecureSkipVerify: <span class="literal">true</span>&#125;,</span><br><span class="line"> &#125;</span><br><span class="line"> client := &amp;http.Client&#123;Transport: tr&#125;</span><br><span class="line"> res, _ := client.Do(authReq)</span><br><span class="line"> <span class="keyword">return</span> res</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"crypto/tls"</span></span><br><span class="line"> <span class="string">"net/http"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">doAuthReq</span><span class="params">(authReq *http.Request)</span> *<span class="title">http</span>.<span class="title">Response</span></span> &#123;</span><br><span class="line"> tr := &amp;http.Transport&#123;</span><br><span class="line"> TLSClientConfig: &amp;tls.Config&#123;InsecureSkipVerify: <span class="literal">false</span>&#125;,</span><br><span class="line"> &#125;</span><br><span class="line"> client := &amp;http.Client&#123;Transport: tr&#125;</span><br><span class="line"> res, _ := client.Do(authReq)</span><br><span class="line"> <span class="keyword">return</span> res</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-5-敏感数据保护"><a href="#1-5-敏感数据保护" class="headerlink" title="1.5 敏感数据保护"></a>1.5 敏感数据保护</h3><h4 id="1-5-1【必须】敏感信息访问"><a href="#1-5-1【必须】敏感信息访问" class="headerlink" title="1.5.1【必须】敏感信息访问"></a>1.5.1【必须】敏感信息访问</h4><ul>
<li>禁止将敏感信息硬编码在程序中,既可能会将敏感信息暴露给攻击者,也会增加代码管理和维护的难度</li>
<li>使用配置中心系统统一托管密钥等敏感信息</li>
</ul>
<h4 id="1-5-2【必须】敏感数据输出"><a href="#1-5-2【必须】敏感数据输出" class="headerlink" title="1.5.2【必须】敏感数据输出"></a>1.5.2【必须】敏感数据输出</h4><ul>
<li>只输出必要的最小数据集,避免多余字段暴露引起敏感信息泄露</li>
<li>不能在日志保存密码(包括明文密码和密文密码)、密钥和其它敏感信息</li>
<li>对于必须输出的敏感信息,必须进行合理脱敏展示</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">serve</span><span class="params">()</span></span> &#123;</span><br><span class="line"> http.HandleFunc(<span class="string">"/register"</span>, <span class="function"><span class="keyword">func</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> r.ParseForm()</span><br><span class="line"> user := r.Form.Get(<span class="string">"user"</span>)</span><br><span class="line"> pw := r.Form.Get(<span class="string">"password"</span>)</span><br><span class="line"></span><br><span class="line"> log.Printf(<span class="string">"Registering new user %s with password %s.\n"</span>, user, pw)</span><br><span class="line"> &#125;)</span><br><span class="line"> http.ListenAndServe(<span class="string">":80"</span>, <span class="literal">nil</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">serve1</span><span class="params">()</span></span> &#123;</span><br><span class="line"> http.HandleFunc(<span class="string">"/register"</span>, <span class="function"><span class="keyword">func</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> r.ParseForm()</span><br><span class="line"> user := r.Form.Get(<span class="string">"user"</span>)</span><br><span class="line"> pw := r.Form.Get(<span class="string">"password"</span>)</span><br><span class="line"></span><br><span class="line"> log.Printf(<span class="string">"Registering new user %s.\n"</span>, user)</span><br><span class="line"></span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> use(pw)</span><br><span class="line"> &#125;)</span><br><span class="line"> http.ListenAndServe(<span class="string">":80"</span>, <span class="literal">nil</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<ul>
<li>避免通过GET方法、代码注释、自动填充、缓存等方式泄露敏感信息</li>
</ul>
<h4 id="1-5-3【必须】敏感数据存储"><a href="#1-5-3【必须】敏感数据存储" class="headerlink" title="1.5.3【必须】敏感数据存储"></a>1.5.3【必须】敏感数据存储</h4><ul>
<li>敏感数据应使用SHA2、RSA等算法进行加密存储</li>
<li>敏感数据应使用独立的存储层,并在访问层开启访问控制</li>
<li>包含敏感信息的临时文件或缓存一旦不再需要应立刻删除</li>
</ul>
<h4 id="1-5-4【必须】异常处理和日志记录"><a href="#1-5-4【必须】异常处理和日志记录" class="headerlink" title="1.5.4【必须】异常处理和日志记录"></a>1.5.4【必须】异常处理和日志记录</h4><ul>
<li>应合理使用panic、recover、defer处理系统异常,避免出错信息输出到前端</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">defer</span> <span class="function"><span class="keyword">func</span> <span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> r := <span class="built_in">recover</span>(); r != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Recovered in start()"</span>)</span><br><span class="line"> &#125;</span><br><span class="line">&#125;()</span><br></pre></td></tr></table></figure>
<ul>
<li>对外环境禁止开启debug模式,或将程序运行日志输出到前端</li>
</ul>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">// bad</span><br><span class="line">dlv --listen=:2345 --headless=<span class="literal">true</span> --api-version=2 debug test.go</span><br><span class="line">// good</span><br><span class="line">dlv debug test.go</span><br></pre></td></tr></table></figure>
<h3 id="1-6-加密解密"><a href="#1-6-加密解密" class="headerlink" title="1.6 加密解密"></a>1.6 加密解密</h3><h4 id="1-6-1【必须】不得硬编码密码-密钥"><a href="#1-6-1【必须】不得硬编码密码-密钥" class="headerlink" title="1.6.1【必须】不得硬编码密码/密钥"></a>1.6.1【必须】不得硬编码密码/密钥</h4><ul>
<li>在进行用户登陆,加解密算法等操作时,不得在代码里硬编码密钥或密码,可通过变换算法或者配置等方式设置密码或者密钥。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="keyword">const</span> (</span><br><span class="line"> user = <span class="string">"dbuser"</span></span><br><span class="line"> password = <span class="string">"s3cretp4ssword"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">connect</span><span class="params">()</span> *<span class="title">sql</span>.<span class="title">DB</span></span> &#123;</span><br><span class="line"> connStr := fmt.Sprintf(<span class="string">"postgres://%s:%s@localhost/pqgotest"</span>, user, password)</span><br><span class="line"> db, err := sql.Open(<span class="string">"postgres"</span>, connStr)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> db</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="keyword">var</span> (</span><br><span class="line"> commonkey = []<span class="keyword">byte</span>(<span class="string">"0123456789abcdef"</span>)</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">AesEncrypt</span><span class="params">(plaintext <span class="keyword">string</span>)</span> <span class="params">(<span class="keyword">string</span>, error)</span></span> &#123;</span><br><span class="line"> block, err := aes.NewCipher(commonkey)</span><br><span class="line"> <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line"> <span class="keyword">return</span> <span class="string">""</span>, err</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-6-2【必须】密钥存储安全"><a href="#1-6-2【必须】密钥存储安全" class="headerlink" title="1.6.2【必须】密钥存储安全"></a>1.6.2【必须】密钥存储安全</h4><ul>
<li>在使用对称密码算法时,需要保护好加密密钥。当算法涉及敏感、业务数据时,可通过非对称算法协商加密密钥。其他较为不敏感的数据加密,可以通过变换算法等方式保护密钥。</li>
</ul>
<h4 id="1-6-3【推荐】不使用弱密码算法"><a href="#1-6-3【推荐】不使用弱密码算法" class="headerlink" title="1.6.3【推荐】不使用弱密码算法"></a>1.6.3【推荐】不使用弱密码算法</h4><ul>
<li>在使用加密算法时,不建议使用加密强度较弱的算法。</li>
</ul>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">// bad</span><br><span class="line">crypto/des,crypto/md5,crypto/sha1,crypto/rc4等。</span><br><span class="line"></span><br><span class="line">// good</span><br><span class="line">crypto/rsa,crypto/aes等。</span><br></pre></td></tr></table></figure>
<h3 id="1-7-正则表达式"><a href="#1-7-正则表达式" class="headerlink" title="1.7 正则表达式"></a>1.7 正则表达式</h3><h4 id="1-7-1【推荐】使用regexp进行正则表达式匹配"><a href="#1-7-1【推荐】使用regexp进行正则表达式匹配" class="headerlink" title="1.7.1【推荐】使用regexp进行正则表达式匹配"></a>1.7.1【推荐】使用regexp进行正则表达式匹配</h4><ul>
<li>正则表达式编写不恰当可被用于DoS攻击,造成服务不可用,推荐使用regexp包进行正则表达式匹配。regexp保证了线性时间性能和优雅的失败:对解析器、编译器和执行引擎都进行了内存限制。但regexp不支持以下正则表达式特性,如业务依赖这些特性,则regexp不适合使用。<ul>
<li>回溯引用<a href="https://www.regular-expressions.info/backref.html" target="_blank" rel="external">Backreferences</a></li>
<li>查看<a href="https://www.regular-expressions.info/lookaround.html" target="_blank" rel="external">Lookaround</a></li>
</ul>
</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line">matched, err := regexp.MatchString(<span class="string">`a.b`</span>, <span class="string">"aaxbb"</span>)</span><br><span class="line">fmt.Println(matched) <span class="comment">// true</span></span><br><span class="line">fmt.Println(err) <span class="comment">// nil</span></span><br></pre></td></tr></table></figure>
<h1 id="后台类"><a href="#后台类" class="headerlink" title="后台类"></a>后台类</h1><h2 id="1-代码实现类-1"><a href="#1-代码实现类-1" class="headerlink" title="1 代码实现类"></a>1 代码实现类</h2><h3 id="1-1-输入校验"><a href="#1-1-输入校验" class="headerlink" title="1.1 输入校验"></a>1.1 输入校验</h3><h4 id="1-1-1【必须】按类型进行数据校验"><a href="#1-1-1【必须】按类型进行数据校验" class="headerlink" title="1.1.1【必须】按类型进行数据校验"></a>1.1.1【必须】按类型进行数据校验</h4><ul>
<li>所有外部输入的参数,应使用<code>validator</code>进行白名单校验,校验内容包括但不限于数据长度、数据范围、数据类型与格式,校验不通过的应当拒绝</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"github.com/go-playground/validator/v10"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> validate *validator.Validate</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">validateVariable</span><span class="params">()</span></span> &#123;</span><br><span class="line"> myEmail := <span class="string">"[email protected]"</span></span><br><span class="line"> errs := validate.Var(myEmail, <span class="string">"required,email"</span>)</span><br><span class="line"> <span class="keyword">if</span> errs != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(errs)</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> <span class="comment">//停止执行</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="comment">// 验证通过,继续执行</span></span><br><span class="line"> ...</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> validate = validator.New()</span><br><span class="line"> validateVariable()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<ul>
<li>无法通过白名单校验的应使用<code>html.EscapeString</code>、<code>text/template</code>或<code>bluemonday</code>对<code>&lt;, &gt;, &amp;, &#39;,&quot;</code>等字符进行过滤或编码</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"text/template"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="comment">// TestHTMLEscapeString HTML特殊字符转义</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">(inputValue <span class="keyword">string</span>)</span> <span class="title">string</span></span> &#123;</span><br><span class="line"> escapedResult := template.HTMLEscapeString(inputValue)</span><br><span class="line"> <span class="keyword">return</span> escapedResult</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-2-SQL操作"><a href="#1-2-SQL操作" class="headerlink" title="1.2 SQL操作"></a>1.2 SQL操作</h3><h4 id="1-2-1【必须】SQL语句默认使用预编译并绑定变量"><a href="#1-2-1【必须】SQL语句默认使用预编译并绑定变量" class="headerlink" title="1.2.1【必须】SQL语句默认使用预编译并绑定变量"></a>1.2.1【必须】SQL语句默认使用预编译并绑定变量</h4><ul>
<li>使用<code>database/sql</code>的prepare、Query或使用GORM等ORM执行SQL操作</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"github.com/jinzhu/gorm"</span></span><br><span class="line"> _ <span class="string">"github.com/jinzhu/gorm/dialects/sqlite"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> Product <span class="keyword">struct</span> &#123;</span><br><span class="line"> gorm.Model</span><br><span class="line"> Code <span class="keyword">string</span></span><br><span class="line"> Price <span class="keyword">uint</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">...</span><br><span class="line"><span class="keyword">var</span> product Product</span><br><span class="line">...</span><br><span class="line">db.First(&amp;product, <span class="number">1</span>)</span><br></pre></td></tr></table></figure>
<ul>
<li>使用参数化查询,禁止拼接SQL语句,另外对于传入参数用于order by或表名的需要通过校验</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"database/sql"</span></span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"net/http"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handler</span><span class="params">(db *sql.DB, req *http.Request)</span></span> &#123;</span><br><span class="line"> q := fmt.Sprintf(<span class="string">"SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY='%s' ORDER BY PRICE"</span>,</span><br><span class="line"> req.URL.Query()[<span class="string">"category"</span>])</span><br><span class="line"> db.Query(q)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handlerGood</span><span class="params">(db *sql.DB, req *http.Request)</span></span> &#123;</span><br><span class="line"> <span class="comment">// 使用?占位符</span></span><br><span class="line"> q := <span class="string">"SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY='?' ORDER BY PRICE"</span></span><br><span class="line"> db.Query(q, req.URL.Query()[<span class="string">"category"</span>])</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-3-网络请求"><a href="#1-3-网络请求" class="headerlink" title="1.3 网络请求"></a>1.3 网络请求</h3><h4 id="1-3-1【必须】资源请求过滤验证"><a href="#1-3-1【必须】资源请求过滤验证" class="headerlink" title="1.3.1【必须】资源请求过滤验证"></a>1.3.1【必须】资源请求过滤验证</h4><ul>
<li><p>使用<code>&quot;net/http&quot;</code>下的方法<code>http.Get(url)</code>、<code>http.Post(url, contentType, body)</code>、<code>http.Head(url)</code>、<code>http.PostForm(url, data)</code>、<code>http.Do(req)</code>时,如变量值外部可控(指从参数中动态获取),应对请求目标进行严格的安全校验。</p>
</li>
<li><p>如请求资源域名归属固定的范围,如只允许<code>a.qq.com</code>和<code>b.qq.com</code>,应做白名单限制。如不适用白名单,则推荐的校验逻辑步骤是:</p>
<ul>
<li><p>第 1 步、只允许HTTP或HTTPS协议</p>
</li>
<li><p>第 2 步、解析目标URL,获取其HOST</p>
</li>
<li><p>第 3 步、解析HOST,获取HOST指向的IP地址转换成Long型</p>
</li>
<li><p>第 4 步、检查IP地址是否为内网IP,网段有:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">// 以RFC定义的专有网络为例,如有自定义私有网段亦应加入禁止访问列表。</span><br><span class="line">10.0.0.0/8</span><br><span class="line">172.16.0.0/12</span><br><span class="line">192.168.0.0/16</span><br><span class="line">127.0.0.0/8</span><br></pre></td></tr></table></figure>
</li>
<li><p>第 5 步、请求URL</p>
</li>
<li><p>第 6 步、如有跳转,跳转后执行1,否则绑定经校验的ip和域名,对URL发起请求</p>
</li>
</ul>
</li>
<li><p>官方库<code>encoding/xml</code>不支持外部实体引用,使用该库可避免xxe漏洞</p>
</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"encoding/xml"</span></span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"os"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">type</span> Person <span class="keyword">struct</span> &#123;</span><br><span class="line"> XMLName xml.Name <span class="string">`xml:"person"`</span></span><br><span class="line"> Id <span class="keyword">int</span> <span class="string">`xml:"id,attr"`</span></span><br><span class="line"> UserName <span class="keyword">string</span> <span class="string">`xml:"name&gt;first"`</span></span><br><span class="line"> Comment <span class="keyword">string</span> <span class="string">`xml:",comment"`</span></span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> v := &amp;Person&#123;Id: <span class="number">13</span>, UserName: <span class="string">"John"</span>&#125;</span><br><span class="line"> v.Comment = <span class="string">" Need more details. "</span></span><br><span class="line"></span><br><span class="line"> enc := xml.NewEncoder(os.Stdout)</span><br><span class="line"> enc.Indent(<span class="string">" "</span>, <span class="string">" "</span>)</span><br><span class="line"> <span class="keyword">if</span> err := enc.Encode(v); err != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Printf(<span class="string">"error: %v\n"</span>, err)</span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-4-服务器端渲染"><a href="#1-4-服务器端渲染" class="headerlink" title="1.4 服务器端渲染"></a>1.4 服务器端渲染</h3><h4 id="1-4-1【必须】模板渲染过滤验证"><a href="#1-4-1【必须】模板渲染过滤验证" class="headerlink" title="1.4.1【必须】模板渲染过滤验证"></a>1.4.1【必须】模板渲染过滤验证</h4><ul>
<li>使用<code>text/template</code>或者<code>html/template</code>渲染模板时禁止将外部输入参数引入模板,或仅允许引入白名单内字符。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handler</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> r.ParseForm()</span><br><span class="line"> x := r.Form.Get(<span class="string">"name"</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">var</span> tmpl = <span class="string">`&lt;!DOCTYPE html&gt;&lt;html&gt;&lt;body&gt;</span><br><span class="line"> &lt;form action="/" method="post"&gt;</span><br><span class="line"> First name:&lt;br&gt;</span><br><span class="line"> &lt;input type="text" name="name" value=""&gt;</span><br><span class="line"> &lt;input type="submit" value="Submit"&gt;</span><br><span class="line"> &lt;/form&gt;&lt;p&gt;`</span> + x + <span class="string">` &lt;/p&gt;&lt;/body&gt;&lt;/html&gt;`</span></span><br><span class="line"></span><br><span class="line"> t := template.New(<span class="string">"main"</span>)</span><br><span class="line"> t, _ = t.Parse(tmpl)</span><br><span class="line"> t.Execute(w, <span class="string">"Hello"</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"github.com/go-playground/validator/v10"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> validate *validator.Validate</span><br><span class="line">validate = validator.New()</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">validateVariable</span><span class="params">(val)</span></span> &#123;</span><br><span class="line"> errs := validate.Var(val, <span class="string">"gte=1,lte=100"</span>) <span class="comment">// 限制必须是1-100的正整数</span></span><br><span class="line"> <span class="keyword">if</span> errs != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(errs)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handler</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> r.ParseForm()</span><br><span class="line"> x := r.Form.Get(<span class="string">"name"</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> validateVariable(x) &#123;</span><br><span class="line"> <span class="keyword">var</span> tmpl = <span class="string">`&lt;!DOCTYPE html&gt;&lt;html&gt;&lt;body&gt;</span><br><span class="line"> &lt;form action="/" method="post"&gt;</span><br><span class="line"> First name:&lt;br&gt;</span><br><span class="line"> &lt;input type="text" name="name" value=""&gt;</span><br><span class="line"> &lt;input type="submit" value="Submit"&gt;</span><br><span class="line"> &lt;/form&gt;&lt;p&gt;`</span> + x + <span class="string">` &lt;/p&gt;&lt;/body&gt;&lt;/html&gt;`</span></span><br><span class="line"> t := template.New(<span class="string">"main"</span>)</span><br><span class="line"> t, _ = t.Parse(tmpl)</span><br><span class="line"> t.Execute(w, <span class="string">"Hello"</span>)</span><br><span class="line"> &#125; <span class="keyword">else</span> &#123;</span><br><span class="line"> <span class="comment">// ...</span></span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-5-Web跨域"><a href="#1-5-Web跨域" class="headerlink" title="1.5 Web跨域"></a>1.5 Web跨域</h3><h4 id="1-5-1【必须】跨域资源共享CORS限制请求来源"><a href="#1-5-1【必须】跨域资源共享CORS限制请求来源" class="headerlink" title="1.5.1【必须】跨域资源共享CORS限制请求来源"></a>1.5.1【必须】跨域资源共享CORS限制请求来源</h4><ul>
<li>CORS请求保护不当可导致敏感信息泄漏,因此应当严格设置Access-Control-Allow-Origin使用同源策略进行保护。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line">c := cors.New(cors.Options&#123;</span><br><span class="line"> AllowedOrigins: []<span class="keyword">string</span>&#123;<span class="string">"http://qq.com"</span>, <span class="string">"https://qq.com"</span>&#125;,</span><br><span class="line"> AllowCredentials: <span class="literal">true</span>,</span><br><span class="line"> Debug: <span class="literal">false</span>,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 引入中间件</span></span><br><span class="line">handler = c.Handler(handler)</span><br></pre></td></tr></table></figure>
<h3 id="1-6-响应输出"><a href="#1-6-响应输出" class="headerlink" title="1.6 响应输出"></a>1.6 响应输出</h3><h4 id="1-6-1-【必须】设置正确的HTTP响应包类型"><a href="#1-6-1-【必须】设置正确的HTTP响应包类型" class="headerlink" title="1.6.1 【必须】设置正确的HTTP响应包类型"></a>1.6.1 【必须】设置正确的HTTP响应包类型</h4><ul>
<li>响应头Content-Type与实际响应内容,应保持一致。如:API响应数据类型是json,则响应头使用<code>application/json</code>;若为xml,则设置为<code>text/xml</code>。</li>
</ul>
<h4 id="1-6-2-【必须】添加安全响应头"><a href="#1-6-2-【必须】添加安全响应头" class="headerlink" title="1.6.2 【必须】添加安全响应头"></a>1.6.2 【必须】添加安全响应头</h4><ul>
<li>所有接口、页面,添加响应头 <code>X-Content-Type-Options: nosniff</code>。</li>
<li>所有接口、页面,添加响应头<code>X-Frame-Options</code>。按需合理设置其允许范围,包括:<code>DENY</code>、<code>SAMEORIGIN</code>、<code>ALLOW-FROM origin</code>。用法参考:<a href="https://developer.mozilla.org/zh-CN/docs/Web/HTTP/X-Frame-Options" target="_blank" rel="external">MDN文档</a></li>
</ul>
<h4 id="1-6-3【必须】外部输入拼接到HTTP响应头中需进行过滤"><a href="#1-6-3【必须】外部输入拼接到HTTP响应头中需进行过滤" class="headerlink" title="1.6.3【必须】外部输入拼接到HTTP响应头中需进行过滤"></a>1.6.3【必须】外部输入拼接到HTTP响应头中需进行过滤</h4><ul>
<li>应尽量避免外部可控参数拼接到HTTP响应头中,如业务需要则需要过滤掉<code>\r</code>、<code>\n</code>等换行符,或者拒绝携带换行符号的外部输入。</li>
</ul>
<h4 id="1-6-4【必须】外部输入拼接到response页面前进行编码处理"><a href="#1-6-4【必须】外部输入拼接到response页面前进行编码处理" class="headerlink" title="1.6.4【必须】外部输入拼接到response页面前进行编码处理"></a>1.6.4【必须】外部输入拼接到response页面前进行编码处理</h4><ul>
<li>直出html页面或使用模板生成html页面的,推荐使用<code>text/template</code>自动编码,或者使用<code>html.EscapeString</code>或<code>text/template</code>对<code>&lt;, &gt;, &amp;, &#39;,&quot;</code>等字符进行编码。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"html/template"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">outtemplate</span><span class="params">(w http.ResponseWriter, r *http.Request)</span></span> &#123;</span><br><span class="line"> param1 := r.URL.Query().Get(<span class="string">"param1"</span>)</span><br><span class="line"> tmpl := template.New(<span class="string">"hello"</span>)</span><br><span class="line"> tmpl, _ = tmpl.Parse(<span class="string">`&#123;&#123;define "T"&#125;&#125;&#123;&#123;.&#125;&#125;&#123;&#123;end&#125;&#125;`</span>)</span><br><span class="line"> tmpl.ExecuteTemplate(w, <span class="string">"T"</span>, param1)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-7-会话管理"><a href="#1-7-会话管理" class="headerlink" title="1.7 会话管理"></a>1.7 会话管理</h3><h4 id="1-7-1【必须】安全维护session信息"><a href="#1-7-1【必须】安全维护session信息" class="headerlink" title="1.7.1【必须】安全维护session信息"></a>1.7.1【必须】安全维护session信息</h4><ul>
<li>用户登录时应重新生成session,退出登录后应清理session。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"github.com/gorilla/handlers"</span></span><br><span class="line"> <span class="string">"github.com/gorilla/mux"</span></span><br><span class="line"> <span class="string">"net/http"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 创建cookie</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">setToken</span><span class="params">(res http.ResponseWriter, req *http.Request)</span></span> &#123;</span><br><span class="line"> expireToken := time.Now().Add(time.Minute * <span class="number">30</span>).Unix()</span><br><span class="line"> expireCookie := time.Now().Add(time.Minute * <span class="number">30</span>)</span><br><span class="line"></span><br><span class="line"> <span class="comment">//...</span></span><br><span class="line"></span><br><span class="line"> cookie := http.Cookie&#123;</span><br><span class="line"> Name: <span class="string">"Auth"</span>,</span><br><span class="line"> Value: signedToken,</span><br><span class="line"> Expires: expireCookie, <span class="comment">// 过期失效</span></span><br><span class="line"> HttpOnly: <span class="literal">true</span>,</span><br><span class="line"> Path: <span class="string">"/"</span>,</span><br><span class="line"> Domain: <span class="string">"127.0.0.1"</span>,</span><br><span class="line"> Secure: <span class="literal">true</span>,</span><br><span class="line"> &#125;</span><br><span class="line"></span><br><span class="line"> http.SetCookie(res, &amp;cookie)</span><br><span class="line"> http.Redirect(res, req, <span class="string">"/profile"</span>, <span class="number">307</span>)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 删除cookie</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">logout</span><span class="params">(res http.ResponseWriter, req *http.Request)</span></span> &#123;</span><br><span class="line"> deleteCookie := http.Cookie&#123;</span><br><span class="line"> Name: <span class="string">"Auth"</span>,</span><br><span class="line"> Value: <span class="string">"none"</span>,</span><br><span class="line"> Expires: time.Now(),</span><br><span class="line"> &#125;</span><br><span class="line"> http.SetCookie(res, &amp;deleteCookie)</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-7-2【必须】CSRF防护"><a href="#1-7-2【必须】CSRF防护" class="headerlink" title="1.7.2【必须】CSRF防护"></a>1.7.2【必须】CSRF防护</h4><ul>
<li>涉及系统敏感操作或可读取敏感信息的接口应校验<code>Referer</code>或添加<code>csrf_token</code>。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"github.com/gorilla/csrf"</span></span><br><span class="line"> <span class="string">"github.com/gorilla/mux"</span></span><br><span class="line"> <span class="string">"net/http"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> r := mux.NewRouter()</span><br><span class="line"> r.HandleFunc(<span class="string">"/signup"</span>, ShowSignupForm)</span><br><span class="line"> r.HandleFunc(<span class="string">"/signup/post"</span>, SubmitSignupForm)</span><br><span class="line"> <span class="comment">// 使用csrf_token验证</span></span><br><span class="line"> http.ListenAndServe(<span class="string">":8000"</span>,</span><br><span class="line"> csrf.Protect([]<span class="keyword">byte</span>(<span class="string">"32-byte-long-auth-key"</span>))(r))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h3 id="1-8-访问控制"><a href="#1-8-访问控制" class="headerlink" title="1.8 访问控制"></a>1.8 访问控制</h3><h4 id="1-8-1【必须】默认鉴权"><a href="#1-8-1【必须】默认鉴权" class="headerlink" title="1.8.1【必须】默认鉴权"></a>1.8.1【必须】默认鉴权</h4><ul>
<li><p>除非资源完全可对外开放,否则系统默认进行身份认证,使用白名单的方式放开不需要认证的接口或页面。</p>
</li>
<li><p>根据资源的机密程度和用户角色,以最小权限原则,设置不同级别的权限,如完全公开、登录可读、登录可写、特定用户可读、特定用户可写等</p>
</li>
<li><p>涉及用户自身相关的数据的读写必须验证登录态用户身份及其权限,避免越权操作</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">-- 伪代码</span></span><br><span class="line"><span class="keyword">select</span> <span class="keyword">id</span> <span class="keyword">from</span> <span class="keyword">table</span> <span class="keyword">where</span> <span class="keyword">id</span>=:<span class="keyword">id</span> <span class="keyword">and</span> userid=session.userid</span><br></pre></td></tr></table></figure>
</li>
<li><p>没有独立账号体系的外网服务使用<code>QQ</code>或<code>微信</code>登录,内网服务使用<code>统一登录服务</code>登录,其他使用账号密码登录的服务需要增加验证码等二次验证</p>
</li>
</ul>
<h3 id="1-9-并发保护"><a href="#1-9-并发保护" class="headerlink" title="1.9 并发保护"></a>1.9 并发保护</h3><h4 id="1-9-1【必须】禁止在闭包中直接调用循环变量"><a href="#1-9-1【必须】禁止在闭包中直接调用循环变量" class="headerlink" title="1.9.1【必须】禁止在闭包中直接调用循环变量"></a>1.9.1【必须】禁止在闭包中直接调用循环变量</h4><ul>
<li>在循环中启动协程,当协程中使用到了循环的索引值,由于多个协程同时使用同一个变量会产生数据竞争,造成执行结果异常。</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> runtime.GOMAXPROCS(runtime.NumCPU())</span><br><span class="line"> <span class="keyword">var</span> group sync.WaitGroup</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">5</span>; i++ &#123;</span><br><span class="line"> group.Add(<span class="number">1</span>)</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">defer</span> group.Done()</span><br><span class="line"> fmt.Printf(<span class="string">"%-2d"</span>, i) <span class="comment">// 这里打印的i不是所期望的</span></span><br><span class="line"> &#125;()</span><br><span class="line"> &#125;</span><br><span class="line"> group.Wait()</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> runtime.GOMAXPROCS(runtime.NumCPU())</span><br><span class="line"> <span class="keyword">var</span> group sync.WaitGroup</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">5</span>; i++ &#123;</span><br><span class="line"> group.Add(<span class="number">1</span>)</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">(j <span class="keyword">int</span>)</span></span> &#123;</span><br><span class="line"> <span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> r := <span class="built_in">recover</span>(); r != <span class="literal">nil</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Recovered in start()"</span>)</span><br><span class="line"> &#125;</span><br><span class="line"> group.Done()</span><br><span class="line"> &#125;()</span><br><span class="line"> fmt.Printf(<span class="string">"%-2d"</span>, j) <span class="comment">// 闭包内部使用局部变量</span></span><br><span class="line"> &#125;(i) <span class="comment">// 把循环变量显式地传给协程</span></span><br><span class="line"> &#125;</span><br><span class="line"> group.Wait()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-9-2【必须】禁止并发写map"><a href="#1-9-2【必须】禁止并发写map" class="headerlink" title="1.9.2【必须】禁止并发写map"></a>1.9.2【必须】禁止并发写map</h4><ul>
<li>并发写map容易造成程序崩溃并异常退出,建议加锁保护</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> m := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">int</span>]<span class="keyword">int</span>)</span><br><span class="line"> <span class="comment">// 并发读写</span></span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> _ = m[<span class="number">1</span>]</span><br><span class="line"> &#125;</span><br><span class="line"> &#125;()</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> m[<span class="number">2</span>] = <span class="number">1</span></span><br><span class="line"> &#125;</span><br><span class="line"> &#125;()</span><br><span class="line"> <span class="keyword">select</span> &#123;&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<h4 id="1-9-3【必须】确保并发安全"><a href="#1-9-3【必须】确保并发安全" class="headerlink" title="1.9.3【必须】确保并发安全"></a>1.9.3【必须】确保并发安全</h4><p>敏感操作如果未作并发安全限制,可导致数据读写异常,造成业务逻辑限制被绕过。可通过同步锁或者原子操作进行防护。</p>
<p>通过同步锁共享内存</p>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">var</span> count <span class="keyword">int</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">Count</span><span class="params">(lock *sync.Mutex)</span></span> &#123;</span><br><span class="line"> lock.Lock() <span class="comment">// 加写锁</span></span><br><span class="line"> count++</span><br><span class="line"> fmt.Println(count)</span><br><span class="line"> lock.Unlock() <span class="comment">// 解写锁,任何一个Lock()或RLock()均需要保证对应有Unlock()或RUnlock()</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> lock := &amp;sync.Mutex&#123;&#125;</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">10</span>; i++ &#123;</span><br><span class="line"> <span class="keyword">go</span> Count(lock) <span class="comment">// 传递指针是为了防止函数内的锁和调用锁不一致</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> lock.Lock()</span><br><span class="line"> c := count</span><br><span class="line"> lock.Unlock()</span><br><span class="line"> runtime.Gosched() <span class="comment">// 交出时间片给协程</span></span><br><span class="line"> <span class="keyword">if</span> c &gt; <span class="number">10</span> &#123;</span><br><span class="line"> <span class="keyword">break</span></span><br><span class="line"> &#125;</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<ul>
<li>使用<code>sync/atomic</code>执行原子操作</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// good</span></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"sync"</span></span><br><span class="line"> <span class="string">"sync/atomic"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">type</span> Map <span class="keyword">map</span>[<span class="keyword">string</span>]<span class="keyword">string</span></span><br><span class="line"> <span class="keyword">var</span> m atomic.Value</span><br><span class="line"> m.Store(<span class="built_in">make</span>(Map))</span><br><span class="line"> <span class="keyword">var</span> mu sync.Mutex <span class="comment">// used only by writers</span></span><br><span class="line"> read := <span class="function"><span class="keyword">func</span><span class="params">(key <span class="keyword">string</span>)</span> <span class="params">(val <span class="keyword">string</span>)</span></span> &#123;</span><br><span class="line"> m1 := m.Load().(Map)</span><br><span class="line"> <span class="keyword">return</span> m1[key]</span><br><span class="line"> &#125;</span><br><span class="line"> insert := <span class="function"><span class="keyword">func</span><span class="params">(key, val <span class="keyword">string</span>)</span></span> &#123;</span><br><span class="line"> mu.Lock() <span class="comment">// 与潜在写入同步</span></span><br><span class="line"> <span class="keyword">defer</span> mu.Unlock()</span><br><span class="line"> m1 := m.Load().(Map) <span class="comment">// 导入struct当前数据</span></span><br><span class="line"> m2 := <span class="built_in">make</span>(Map) <span class="comment">// 创建新值</span></span><br><span class="line"> <span class="keyword">for</span> k, v := <span class="keyword">range</span> m1 &#123;</span><br><span class="line"> m2[k] = v</span><br><span class="line"> &#125;</span><br><span class="line"> m2[key] = val</span><br><span class="line"> m.Store(m2) <span class="comment">// 用新的替代当前对象</span></span><br><span class="line"> &#125;</span><br><span class="line"> _, _ = read, insert</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<p><img src="/images/wx_dyh.png" alt="微信订阅号"></p>
<blockquote>
<p>来源:<a href="https://github.com/Tencent/secguide" target="_blank" rel="external">https://github.com/Tencent/secguide</a></p>
</blockquote>
</content>
<summary type="html">
<h1 id="通用类"><a href="#通用类" class="headerlink" title="通用类"></a>通用类</h1><h2 id="1-代码实现类"><a href="#1-代码实现类" class="headerlink" title="1. 代码实现类"></a>1. 代码实现类</h2><h3 id="1-1-内存管理"><a href="#1-1-内存管理" class="headerlink" title="1.1 内存管理"></a>1.1 内存管理</h3><h4 id="1-1-1【必须】切片长度校验"><a href="#1-1-1【必须】切片长度校验" class="headerlink" title="1.1.1【必须】切片长度校验"></a>1.1.1【必须】切片长度校验</h4><ul>
<li>在对slice进行操作时,必须判断长度是否合法,防止程序panic</li>
</ul>
<figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// bad: 未判断data的长度,可导致 index out of range</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">decode</span><span class="params">(data []<span class="keyword">byte</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> data[<span class="number">0</span>] == <span class="string">'F'</span> &amp;&amp; data[<span class="number">1</span>] == <span class="string">'U'</span> &amp;&amp; data[<span class="number">2</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">3</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">4</span>] == <span class="string">'E'</span> &amp;&amp; data[<span class="number">5</span>] == <span class="string">'R'</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Bad"</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// bad: slice bounds out of range</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">foo</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> slice = []<span class="keyword">int</span>&#123;<span class="number">0</span>, <span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>, <span class="number">6</span>&#125;</span><br><span class="line"> fmt.Println(slice[:<span class="number">10</span>])</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// good: 使用data前应判断长度是否合法</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">decode</span><span class="params">(data []<span class="keyword">byte</span>)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line"> <span class="keyword">if</span> <span class="built_in">len</span>(data) == <span class="number">6</span> &#123;</span><br><span class="line"> <span class="keyword">if</span> data[<span class="number">0</span>] == <span class="string">'F'</span> &amp;&amp; data[<span class="number">1</span>] == <span class="string">'U'</span> &amp;&amp; data[<span class="number">2</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">3</span>] == <span class="string">'Z'</span> &amp;&amp; data[<span class="number">4</span>] == <span class="string">'E'</span> &amp;&amp; data[<span class="number">5</span>] == <span class="string">'R'</span> &#123;</span><br><span class="line"> fmt.Println(<span class="string">"Good"</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line"> &#125;</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
</summary>
<category term="Golang" scheme="http://team.jiunile.com/categories/Golang/"/>
<category term="代码安全" scheme="http://team.jiunile.com/categories/Golang/%E4%BB%A3%E7%A0%81%E5%AE%89%E5%85%A8/"/>
<category term="go" scheme="http://team.jiunile.com/tags/go/"/>
<category term="代码安全" scheme="http://team.jiunile.com/tags/%E4%BB%A3%E7%A0%81%E5%AE%89%E5%85%A8/"/>
<category term="security" scheme="http://team.jiunile.com/tags/security/"/>
</entry>
<entry>
<title>50 张图,掌握 Kubernetes 中优雅且零停机部署的实现</title>
<link href="http://team.jiunile.com//blog/2021/02/k8s-graceful-shutdown.html"/>
<id>http://team.jiunile.com//blog/2021/02/k8s-graceful-shutdown.html</id>
<published>2021-02-02T14:00:00.000Z</published>
<updated>2021-02-02T06:24:19.000Z</updated>
<content type="html"><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>在本文中,您将了解如何在Pod启动或关闭时防止连接异常,并将学习如何以优雅的方式关闭长时间运行的任务。<br><img src="/images/k8s/g_shutdown_1.png" alt="graceful shutdown"></p>
<a id="more"></a>
<p>在 Kubernetes 中,创建和删除 Pod 是最常见的任务之一。</p>
<p>当您执行滚动更新,扩展部署,每个新发行版,每个作业和 cron 作业等时,都会创建 Pod。</p>
<p>但是在节点被驱逐之后,Pods 也会被删除并重新创建—例如,当您将节点标记为不可调度时。</p>
<p>这些 Pod 的生命是如此短暂,那么当 Pod 在响应请求的过程中却被告知关闭时会发生什么?</p>
<p>请求在关闭之前是否已完成?</p>
<p>接下来的请求又如何呢?</p>
<p>在讨论删除 Pod 时会发生什么之前,有必要讨论一下创建 Pod 时会发生什么。</p>
<p>假设您要在集群中创建以下 Pod:<br><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> v1</span><br><span class="line"><span class="attr">kind:</span> Pod</span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line"><span class="attr"> name:</span> my-pod</span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line"><span class="attr"> containers:</span></span><br><span class="line"><span class="attr"> - name:</span> web</span><br><span class="line"><span class="attr"> image:</span> nginx</span><br><span class="line"><span class="attr"> ports:</span></span><br><span class="line"><span class="attr"> - name:</span> web</span><br><span class="line"><span class="attr"> containerPort:</span> <span class="number">80</span></span><br></pre></td></tr></table></figure></p>
<p>您可以使用以下方式将 YAML 定义提交给集群:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kubectl apply <span class="_">-f</span> pod.yaml</span><br></pre></td></tr></table></figure></p>
<p>输入命令后,kubectl 便将 Pod 定义提交给 Kubernetes API。</p>
<h2 id="在数据库中保存集群的状态"><a href="#在数据库中保存集群的状态" class="headerlink" title="在数据库中保存集群的状态"></a>在数据库中保存集群的状态</h2><p>API 接收和检查 Pod 定义,然后将其存储在数据库 etcd 中。</p>
<p>Pod 也将添加到<a href="https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#scheduling-cycle-binding-cycle" target="_blank" rel="external">调度程序的队列</a>中。</p>
<p>调度程序:</p>
<ol>
<li>检查定义</li>
<li>收集有关工作负载的详细信息,例如 CPU 和内存请求,然后</li>
<li>确定哪个节点最适合运行它。(<a href="https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points" target="_blank" rel="external">通过Filters 和 Predicates</a>)。</li>
</ol>
<p>在过程结束时:</p>
<ul>
<li>在 etcd 中将 Pod 标记为 Scheduled。</li>
<li>为 Pod 分配了一个节点。</li>
<li>Pod 的状态存储在 etcd 中。</li>
</ul>
<p><strong>但是Pod仍然不存在。</strong></p>
<ol>
<li>当您使用 <code>kubectl apply -f</code> 提交一个 Pod 时,YAML 被发送到 kubernetes api。<br><img src="/images/k8s/g_sd_2.png" alt="graceful shutdown"></li>
<li>API 将 Pod 保存在数据库 etcd g中。<br><img src="/images/k8s/g_sd_3.png" alt="graceful shutdown"> </li>
<li>调度程序为这个 Pod 分配最佳节点,并且 Pod 的状态更改为 Pending。pod 只存在于etcd中。<br><img src="/images/k8s/g_sd_4.png" alt="graceful shutdown"></li>
</ol>
<p>先前的任务发生在控制平面中,并且状态存储在数据库中。</p>
<p>那么谁在您的节点中创建 Pod?</p>
<h2 id="Kubelet-—-Kubernetes-代理"><a href="#Kubelet-—-Kubernetes-代理" class="headerlink" title="Kubelet — Kubernetes 代理"></a>Kubelet — Kubernetes 代理</h2><p><strong>kubelet 的工作是轮询控制平面以获取更新。</strong></p>
<p>您可以想象 kubelet 不断地向主节点询问:“我管理工作节点1,是否对我有任何新的 Pod?”。</p>
<p>当有 Pod 时,kubelet 会创建它。</p>
<p>有一点需要注意。</p>
<p>kubelet 不会自行创建 Pod。而是将工作委托给其他三个组件:</p>
<ol>
<li><strong>容器运行时接口(CRI)</strong> — 为 Pod 创建容器的组件。</li>
<li><strong>容器网络接口(CNI)</strong> — 将容器连接到群集网络并分配IP地址的组件。</li>
<li><strong>容器存储接口(CSI)</strong> — 在容器中装载卷的组件。</li>
</ol>
<p>在大多数情况下,容器运行时接口(CRI)的工作类似于:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker run <span class="_">-d</span> &lt;my-container-image&gt;</span><br></pre></td></tr></table></figure></p>
<p>容器网络接口(CNI)有点有趣,因为它负责:</p>
<ol>
<li>为 Pod 生成有效的 IP 地址。</li>
<li>将容器连接到网络的其余部分。</li>
</ol>
<p>可以想象,有几种方法可以将容器连接到网络并分配有效的 IP 地址(您可以在 IPv4 或 IPv6 之间进行选择,也可以分配多个 I P地址)。</p>
<p>例如,<a href="https://archive.shivam.dev/docker-networking-explained/" target="_blank" rel="external">Docker 创建虚拟以太网对并将其连接到网桥</a>,而 <a href="https://itnext.io/kubernetes-is-hard-why-eks-makes-it-easier-for-network-and-security-architects-ea6d8b2ca965" target="_blank" rel="external">AWS—CNI 将 Pods 直接连接到虚拟私有云(VPC)</a>。</p>
<p>当容器网络接口完成其工作时,Pod已连接到网络,并分配了有效的IP地址。</p>
<p>还有一个问题。</p>
<p><strong>Kubelet 知道 IP 地址(因为它调用了容器网络接口),但是控制平面却不知道。</strong></p>
<p>没有人告诉主节点,该Pod已分配了IP地址,并准备接收流量。</p>
<p>就控制平面而言,仍在创建 Pod。</p>
<p><strong>Kubelet 的工作是收集 Pod 的所有详细信息(例如 I P地址)并将其报告回控制平面。</strong></p>
<p>您可以想象检查 etcd 不仅可以显示 Pod 的运行位置,还可以显示其 IP 地址。</p>
<ol>
<li>Kubelet 轮询控制平面以获取更新。<br><img src="/images/k8s/g_sd_5.png" alt="graceful shutdown"></li>
<li>当一个新的 Pod 分配给它的节点时,kubelet 会检索详细信息<br><img src="/images/k8s/g_sd_6.png" alt="graceful shutdown"></li>
<li>Kubernetns 不会自己创建 pod。它依赖于三个组件:容器运行时接口、容器网络接口和容器存储接口。<br><img src="/images/k8s/g_sd_7.png" alt="graceful shutdown"></li>
<li>一旦所有三个组件都成功完成,Pod 就在您的节点中运行并分配了一个 IP 地址。<br><img src="/images/k8s/g_sd_8.png" alt="graceful shutdown"></li>
<li>kubelet 向控制平面报告 IP 地址。<br><img src="/images/k8s/g_sd_9.png" alt="graceful shutdown"></li>
</ol>
<p>如果 Pod 不是任何服务的一部分,那么任务将结束。</p>
<p>Pod 已创建并可以使用。</p>
<p>如果 Pod 是服务的一部分,则还需要执行几个步骤。</p>
<h2 id="Pods-和-Services"><a href="#Pods-和-Services" class="headerlink" title="Pods 和 Services"></a>Pods 和 Services</h2><p>创建服务时,通常需要注意以下两条信息:</p>
<ol>
<li><code>selector</code> — 用于指定将接收流量的 Pod。</li>
<li><code>targetPort</code> — 通过 pod 使用的端口接收的流量。</li>
</ol>
<p>服务的典型 YAML 定义如下所示:<br><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> v1</span><br><span class="line"><span class="attr">kind:</span> Service</span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line"><span class="attr"> name:</span> my-service</span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line"><span class="attr"> ports:</span></span><br><span class="line"><span class="attr"> - port:</span> <span class="number">80</span></span><br><span class="line"><span class="attr"> targetPort:</span> <span class="number">3000</span></span><br><span class="line"><span class="attr"> selector:</span></span><br><span class="line"><span class="attr"> name:</span> app</span><br></pre></td></tr></table></figure></p>
<p>将 Service 提交给集群时 <code>kubectl apply</code>,Kubernetes 会找到所有具有与selector(<code>name: app</code>)相同标签的 Pod,并收集其 IP 地址 — 但前提是它们已通过<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-tcp-liveness-probe" target="_blank" rel="external">Readiness 探针</a>。</p>
<p>然后,对于每个 IP 地址,它将 IP 地址和端口连接在一起。</p>
<p>如果 IP 地址是 <code>10.0.0.3</code> 和,<code>targetPort</code> 是 <code>3000</code>,Kubernetes 将两个结果连接起来并称为 endpoint。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">IP address + port = endpoint</span><br><span class="line">---------------------------------</span><br><span class="line">10.0.0.3 + 3000 = 10.0.0.3:3000</span><br></pre></td></tr></table></figure></p>
<p>endpoint 存储在 etcd 的另一个名为 Endpoint 的对象中。</p>
<p><em>是否有点疑惑?</em></p>
<p>Kubernetes 中定义:</p>
<ul>
<li>endpoint 是 IP 地址 + 端口对(<code>10.0.0.3:3000</code>)。</li>
<li>Endpoint 是 endpoint 的集合。</li>
</ul>
<p>Endpoint 对象是 Kubernetes 中的真实对象,对于每个服务 Kubernetes 都会自动创建一个 endpoint 对象。</p>
<p>您可以使用以下方法进行验证:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ kubectl get services,endpoints</span><br><span class="line">NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)</span><br><span class="line">service/my-service-1 ClusterIP 10.105.17.65 &lt;none&gt; 80/TCP</span><br><span class="line">service/my-service-2 ClusterIP 10.96.0.1 &lt;none&gt; 443/TCP</span><br><span class="line"></span><br><span class="line">NAME ENDPOINTS</span><br><span class="line">endpoints/my-service-1 172.17.0.6:80,172.17.0.7:80</span><br><span class="line">endpoints/my-service-2 192.168.99.100:8443</span><br></pre></td></tr></table></figure></p>
<p>Endpoint 从 Pod 收集所有 IP 地址和端口。</p>
<p>但并不是一次性的。</p>
<p>在以下情况下,将使用新的 endpoint 列表刷新 Endpoint 对象:</p>
<ol>
<li>创建一个 Pod。</li>
<li>Pod 已删除。</li>
<li>在 Pod 上修改了标签。</li>
</ol>
<p>因此,您可以想象,每次创建 Pod 并在 kubelet 将其 IP 地址发布到主节点后,Kubernetes 都会更新所有 endpoint 以反映更改:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ kubectl get services,endpoints</span><br><span class="line">NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)</span><br><span class="line">service/my-service-1 ClusterIP 10.105.17.65 &lt;none&gt; 80/TCP</span><br><span class="line">service/my-service-2 ClusterIP 10.96.0.1 &lt;none&gt; 443/TCP</span><br><span class="line"></span><br><span class="line">NAME ENDPOINTS</span><br><span class="line">endpoints/my-service-1 172.17.0.6:80,172.17.0.7:80,172.17.0.8:80</span><br><span class="line">endpoints/my-service-2 192.168.99.100:8443</span><br></pre></td></tr></table></figure></p>
<p>很好,endpoint 存储在控制平面中,并且 endpoint 对象已更新。</p>
<ol>
<li>在此图中,集群中部署了一个 Pod。Pod 属于服务。如果您要检查 etcd,则可以找到 Pod 的详细信息以及服务。<br><img src="/images/k8s/g_sd_10.png" alt="graceful shutdown"></li>
<li>当部署新 pod 后会发生什么?<br><img src="/images/k8s/g_sd_11.png" alt="graceful shutdown"></li>
<li>Kubernetes 必须跟踪 Pod 及其 IP 地址。服务应该将流量路由到新的 endpoint,因此应该传播 IP 地址和端口。<br><img src="/images/k8s/g_sd_12.png" alt="graceful shutdown"></li>
<li>当部署另一个 Pod 时会发生什么?<br><img src="/images/k8s/g_sd_13.png" alt="graceful shutdown"></li>
<li>完全相同的过程。在数据库中为 Pod 创建一个新的“记录”,并传递给 endpoint。<br><img src="/images/k8s/g_sd_14.png" alt="graceful shutdown"></li>
<li>但是,当一个 Pod 被删除时会发生什么呢?<br><img src="/images/k8s/g_sd_15.png" alt="graceful shutdown"></li>
<li>服务会立即删除 endpoint,最后,Pod 也会从数据库中删除。<br><img src="/images/k8s/g_sd_16.png" alt="graceful shutdown"></li>
<li>Kubernetes 会对集群中的每一个小变化做出反应。<br><img src="/images/k8s/g_sd_17.png" alt="graceful shutdown"></li>
</ol>
<p><em>您准备好开始使用 Pod 了吗?</em></p>
<h2 id="在-Kubernetes-中使用-Endpoint"><a href="#在-Kubernetes-中使用-Endpoint" class="headerlink" title="在 Kubernetes 中使用 Endpoint"></a>在 Kubernetes 中使用 Endpoint</h2><p><strong>endpoint 由 Kubernetes 中的多个组件使用。</strong></p>
<p>Kube-proxy 使用 endpoint 在节点上设置 iptables 规则。</p>
<p>因此,每当 endpoint(对象)发生变化时,kube-proxy 就会检索新的 IP 地址和端口列表,并编写新的 iptables 规则。</p>
<ol>
<li>让我们考虑具有两个 Pod 且不包含 Service 的三节点群集。Pod 的状态存储在 etcd 中。<br><img src="/images/k8s/g_sd_18.png" alt="graceful shutdown"></li>
<li>创建服务时会发生什么?<br><img src="/images/k8s/g_sd_19.png" alt="graceful shutdown"></li>
<li>Kubernetes 创建了一个 endpoint 对象,并从 pod 收集所有 endpoint(IP 地址和端口对)。<br><img src="/images/k8s/g_sd_20.png" alt="graceful shutdown"></li>
<li>Kube-proxy 守护进程监听 endpoint 的更改。<br><img src="/images/k8s/g_sd_21.png" alt="graceful shutdown"></li>
<li>当添加、删除或更新 endpoint 时,kube-proxy 检索 endpoint 的新列表。<br><img src="/images/k8s/g_sd_22.png" alt="graceful shutdown"></li>
<li>Kube-proxy 使用 endpoint 在集群的每个节点上创建 iptables 规则。<br><img src="/images/k8s/g_sd_23.png" alt="graceful shutdown"></li>
</ol>
<p>Ingress controller 使用相同的 endpoint 列表。</p>
<p>Ingress controller 是群集中将外部流量路由到群集中的那个组件。</p>
<p>设置 Ingress 清单时,通常将 Service 指定为目标:<br><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> networking.k8s.io/v1beta1</span><br><span class="line"><span class="attr">kind:</span> Ingress</span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line"><span class="attr"> name:</span> my-ingress</span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line"><span class="attr"> rules:</span></span><br><span class="line"><span class="attr"> - http:</span></span><br><span class="line"><span class="attr"> paths:</span></span><br><span class="line"><span class="attr"> - backend:</span></span><br><span class="line"><span class="attr"> serviceName:</span> my-service</span><br><span class="line"><span class="attr"> servicePort:</span> <span class="number">80</span></span><br><span class="line"><span class="attr"> path:</span> /</span><br></pre></td></tr></table></figure></p>
<p><em>实际上,流量不会路由到服务。</em></p>
<p>取而代之的是,Ingress controller 设置了一个订阅,每次该服务的 endpoint 更改时都将收到通知。</p>
<p><strong>Ingress 会将流量直接路由到 Pod,从而跳过服务</strong>。</p>
<p>可以想象,每次更改 endpoint(对象)时,Ingress 都会检索 IP 地址和端口的新列表,并将控制器重新配置为包括新的 Pod。</p>
<ol>
<li>在这张图片中,有一个 Ingress 控制器,它带有两个副本和一个 Service 的 Deployment。<br><img src="/images/k8s/g_sd_24.png" alt="graceful shutdown"></li>
<li>如果您想通过入口将外部流量路由到 Pods,您应该创建一个入口清单(一个 YAML 文件)。<br><img src="/images/k8s/g_sd_25.png" alt="graceful shutdown"></li>
<li>一旦你运行了 <code>kubectl apply -f ingress.yaml</code>,入口控制器从控制平面检索文件。<br><img src="/images/k8s/g_sd_26.png" alt="graceful shutdown"></li>
<li>Ingress YAML 有一个 <code>serviceName</code> 属性,该属性描述它应该使用哪个服务。<br><img src="/images/k8s/g_sd_27.png" alt="graceful shutdown"></li>
<li>入口控制器从服务检索 Endpoint 列表并跳过它。流量直接流向 endpoint(pod)。<br><img src="/images/k8s/g_sd_28.png" alt="graceful shutdown"></li>
<li>当一个新的 Pod 被创建时会发生什么?<br><img src="/images/k8s/g_sd_29.png" alt="graceful shutdown"></li>
<li>您已经知道 Kubernetes 如何创建 Pod 并通告 endpoint。<br><img src="/images/k8s/g_sd_30.png" alt="graceful shutdown"></li>
<li>入口控制器正在订阅对 endpoint 的更改。因为有一个变更的通知,它检索新的 Endpoint 列表。<br><img src="/images/k8s/g_sd_31.png" alt="graceful shutdown"></li>
<li>入口控制器将流量路由到新的 Pod。<br><img src="/images/k8s/g_sd_32.png" alt="graceful shutdown"></li>
</ol>
<p>有更多的 Kubernetes 组件示例订阅了对 endpoint 的更改。</p>
<p>集群中的 DNS 组件 CoreDNS 是另一个示例。</p>
<p>如果您使用 <a href="https://kubernetes.io/docs/concepts/services-networking/service/#headless-services" target="_blank" rel="external">Headless 类型的服务</a>,则每次添加或删除 endpoint 时,CoreDNS 都必须订阅对e ndpoint 的更改并重新配置自身。</p>
<p>相同的 endpoint 被 istio 或 Linkerd 之类的服务网格所使用,<a href="https://thebsdbox.co.uk/2020/03/18/Creating-a-Kubernetes-cloud-doesn-t-required-boiling-the-ocean/" target="_blank" rel="external">云提供商也创建了</a> <code>type:LoadBalancer</code>。</p>
<p>您必须记住,有几个组件订阅了对endpoint的更改,它们可能会在不同时间收到有关 endpoint 更新的通知。</p>
<p><em>够了吗,还是在创建 Pod 之后有什么事发生?</em></p>
<p><strong>这次您完成了!</strong></p>
<p>快速回顾一下创建Pod时发生的情况:</p>
<ol>
<li>Pod 存储在 etcd 中。</li>
<li>调度程序分配一个节点。它将节点写入 etcd。</li>
<li>向 kubelet 通知新的和预定的 Pod。</li>
<li>kubelet 将创建容器的委托委派给容器运行时接口(CRI)。</li>
<li>kubelet 代表将容器附加到容器网络接口(CNI)。</li>
<li>kubelet 将容器中的安装卷委托给容器存储接口(CSI)。</li>
<li>容器网络接口分配 IP 地址。</li>
<li>Kubelet 将 IP 地址报告给控制平面。</li>
<li>IP 地址存储在 etcd 中。</li>
</ol>
<p>如果您的 Pod 属于服务:</p>
<ol>
<li>Kubelet 等待成功的 Readiness 探针。</li>
<li>通知所有相关的 endpoint(对象)更改。</li>
<li>Endpoint 将新 endpoint(IP 地址 + 端口对)添加到其列表中。</li>
<li>Endpoint 更改将通知 Kube-proxy。Kube-proxy 更新每个节点上的 iptables 规则。</li>
<li>通知 Endpoint 变化的入口控制器。控制器将流量路由到新的 IP 地址。</li>
<li>CoreDNS 通知 Endpoint 更改。如果服务的类型为 Headless,则更新 DNS 条目。</li>
<li>向云提供商通知 Endpoint 更改。如果服务为 <code>type: LoadBalancer</code>,则将新 Endpoint 配置为负载均衡器池的一部分。</li>
<li>Endpoint 更改将通知群集中安装的所有服务网格。</li>
<li>订阅 Endpoint 更改的其他运营商也会收到通知。</li>
</ol>
<p>如此长的列表令人惊讶地只是一项常见任务 — 创建 Pod。</p>
<p>Pod 正在运行。现在是时候讨论删除它时会发生什么。</p>
<h2 id="删除-pod"><a href="#删除-pod" class="headerlink" title="删除 pod"></a>删除 pod</h2><p>您可能已经猜到了,但是删除 Pod 时,必须遵循相同的步骤,但要相反。</p>
<p>首先,应从 endpoint(对象)中删除 endpoint。</p>
<p>这次将忽略 “Readiness” 探针,并立即从控制平面移除 endpoint。</p>
<p>依次触发所有事件到 kube-proxy,Ingress 控制器,DNS,服务网格等。</p>
<p>这些组件将更新其内部状态,并停止将流量路由到IP地址。</p>
<p>由于组件可能正在忙于做其他事情,<strong>因此无法保证从其内部状态中删除IP地址将花费多长时间</strong>。</p>
<p>对于某些人来说,可能不到一秒钟。对于其他人,可能需要更多时间。</p>
<ol>
<li>如果您要使用删除 Pod <code>kubectl delete pod</code>,则该命令将首先到达 Kubernetes API<br><img src="/images/k8s/g_sd_33.png" alt="graceful shutdown"></li>
<li>消息被控制平面中的特定控制器截获:Endpoint 控制器。<br><img src="/images/k8s/g_sd_34.png" alt="graceful shutdown"></li>
<li>Endpoint 控制器向 API 发出命令,从端点对象中删除 IP 地址和端口。<br><img src="/images/k8s/g_sd_35.png" alt="graceful shutdown"></li>
<li>谁侦听 Endpoint 更改? Kube-proxy、入口控制器、CoreDNS 等会收到更改通知。<br><img src="/images/k8s/g_sd_36.png" alt="graceful shutdown"></li>
<li>一些组件(如 kube-proxy )可能需要一些额外的时间来进一步传播更改。<br><img src="/images/k8s/g_sd_37.png" alt="graceful shutdown"></li>
</ol>
<p>同时,etcd 中 Pod 的状态更改为 Termination。</p>
<p>将通知 kubelet 更改并委托:</p>
<ol>
<li>将全部容器卸载到容器存储接口(CSI)。</li>
<li>从网络上分离容器并将IP地址释放到容器网络接口(CNI)。</li>
<li>将容器销毁到容器运行时接口(CRI)。</li>
</ol>
<p>换句话说,Kubernetes遵循与创建Pod完全相同的步骤,但相反。</p>
<ol>
<li>如果您要使用删除 Pod <code>kubectl delete pod</code>,则该命令将首先到达 Kubernetes API。<br><img src="/images/k8s/g_sd_38.png" alt="graceful shutdown"></li>
<li>当 kubelet 轮询控制平面以获取更新时,它注意到 Pod 被删除了。<br><img src="/images/k8s/g_sd_39.png" alt="graceful shutdown"></li>
<li>kubelet 将销毁 Pod 委托给容器运行时接口、容器网络接口和容器存储接口。<br><img src="/images/k8s/g_sd_40.png" alt="graceful shutdown"></li>
</ol>
<p>但是,存在细微但必不可少的差异。</p>
<p><strong>当您终止 Pod 时,将同时删除 endpoint 和发送到 kubelet 的信号。</strong></p>
<p>首次创建 Pod 时,Kubernetes 等待 kubelet 报告 IP 地址,然后开始 endpoint 通告。</p>
<p><strong>但是,当您删除 Pod 时,事件将并行开始。</strong></p>
<p>这可能会导致很多竞争情况。</p>
<p><em>如果在通告 endpoint 之前删除 Pod 怎么办?</em></p>
<ol>
<li>删除 endpoint 和删除 Pod 会同时发生。<br><img src="/images/k8s/g_sd_41.png" alt="graceful shutdown"></li>
<li>因此,您可以在 kube-proxy 更新 iptables 规则之前删除 endpoint。<br><img src="/images/k8s/g_sd_42.png" alt="graceful shutdown"></li>
<li>或者更幸运的是,只有在 endpoint 完全通告之后,Pod 才会被删除。<br><img src="/images/k8s/g_sd_43.png" alt="graceful shutdown"></li>
</ol>
<h2 id="正常关机(Graceful)"><a href="#正常关机(Graceful)" class="headerlink" title="正常关机(Graceful)"></a>正常关机(Graceful)</h2><p>当 Pod 从 kube-proxy 或 Ingress 控制器中删除之前终止时,您可能会遇到停机时间。</p>
<p>而且,如果您考虑一下,这是有道理的。</p>
<p>Kubernetes 仍将流量路由到 IP 地址,但 Pod 不再存在。</p>
<p>Ingress 控制器,kube-proxy,CoreDNS 等没有足够的时间从其内部状态中删除 IP 地址。</p>
<p>理想情况下,在删除 Pod 之前,Kubernetes 应该等待集群中的所有组件具有更新的 endpoint 列表。</p>
<p>但是 Kubernetes 不能那样工作。</p>
<p>Kubernetes 提供了健壮的机制来分布 endpoint(即 Endpoint 对象和更高级的抽象功能,例如 <a href="https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/" target="_blank" rel="external">Endpoint Slices</a>)。</p>
<p>但是,Kubernetes 不会验证订阅 endpoint 更改的组件是否是集群状态的最新信息。</p>
<p>那么,如何避免这种竞争情况并确保在通告 endpoint 之后删除 Pod?</p>
<p><strong>你应该等一下!</strong></p>
<p><strong>当 Pod 即将被删除时,它会收到 SIGTERM 信号。</strong></p>
<p>您的应用程序可以捕获该信号并开始关闭。</p>
<p>由于 endpoint 不太可能立即从 Kubernetes 中的所有组件中删除,因此您可以:</p>
<ol>
<li>请稍等片刻,然后退出。</li>
<li>尽管有 SIGTERM,仍然可以处理传入流量。</li>
<li>最后,关闭现有的长期连接(也许是数据库连接或 WebSocket)。</li>
<li>关闭该过程。</li>
</ol>
<p>你应该等多久?</p>
<p><strong>默认情况下,Kubernetes 将发送 SIGTERM 信号并等待 30 秒,然后强制终止该进程。</strong></p>
<p>因此,您可以在最初的15秒内继续操作,以防万一。</p>
<p>希望该间隔应足以将 endpoint 删除通知到 kube-proxy,Ingress 控制器,CoreDNS 等。</p>
<p>因此,越来越少的流量将到达您的 Pod,直到停止为止。</p>
<p>15 秒后,可以安全地关闭与数据库的连接(或任何持久连接)并终止该过程。</p>
<p>如果您认为需要更多时间,则可以在 20 或 25 秒时停止该过程。</p>
<p>但是,您应该记住,Kubernetes 将在 30 秒后强行终止进程(<a href="https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-execution" target="_blank" rel="external">除非您更改 <code>terminationGracePeriodSecondsPod</code> 定义中的</a>)。</p>
<p>如果您无法更改代码以等待更长的时间怎么办?</p>
<p>您可以调用脚本以等待固定的时间,然后退出应用程序。</p>
<p>在调用 SIGTERM 之前,Kubernetes <code>preStop</code> 在 Pod 中公开一个钩子。</p>
<p>您可以将 <code>preStop</code> 钩子设置为等待 15 秒。</p>
<p>让我们看一个例子:<br><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> v1</span><br><span class="line"><span class="attr">kind:</span> Pod</span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line"><span class="attr"> name:</span> my-pod</span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line"><span class="attr"> containers:</span></span><br><span class="line"><span class="attr"> - name:</span> web</span><br><span class="line"><span class="attr"> image:</span> nginx</span><br><span class="line"><span class="attr"> ports:</span></span><br><span class="line"><span class="attr"> - name:</span> web</span><br><span class="line"><span class="attr"> containerPort:</span> <span class="number">80</span></span><br><span class="line"><span class="attr"> lifecycle:</span></span><br><span class="line"><span class="attr"> preStop:</span></span><br><span class="line"><span class="attr"> exec:</span></span><br><span class="line"><span class="attr"> command:</span> [<span class="string">"sleep"</span>, <span class="string">"15"</span>]</span><br></pre></td></tr></table></figure></p>
<p>该 <code>preStop</code> 钩子是 <a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/" target="_blank" rel="external">Pod LifeCycle 钩子之一</a>。</p>
<p><em>建议延迟 15 秒吗?</em></p>
<p>这要视情况而定,但这可能是开始测试的明智方法。</p>
<p>以下是您可以选择的选项的概述:</p>
<ol>
<li>您已经知道,当删除 Pod 时,会通知 kubelet 更改。<br><img src="/images/k8s/g_sd_44.png" alt="graceful shutdown"></li>
<li>如果 Pod 有一个 <code>preStop</code> 钩子,则首先调用它。<br><img src="/images/k8s/g_sd_45.png" alt="graceful shutdown"></li>
<li>当 <code>preStop</code> 完成时,kubelet 向容器发送 SIGTERM 信号。从那时起,容器应该关闭所有长期存在的连接并准备终止。<br><img src="/images/k8s/g_sd_46.png" alt="graceful shutdown"></li>
<li>默认情况下,进程有 30 秒的时间退出,这包括 <code>preStop</code> 钩子。如果进程还没有退出,kubelet 发送 SIGKILL 信号并强制终止进程。<br><img src="/images/k8s/g_sd_47.png" alt="graceful shutdown"></li>
<li>kubelet 通知控制平面 pod 已成功删除。<br><img src="/images/k8s/g_sd_48.png" alt="graceful shutdown"></li>
</ol>
<h2 id="宽限时间(Grace-periods)和滚动更新"><a href="#宽限时间(Grace-periods)和滚动更新" class="headerlink" title="宽限时间(Grace periods)和滚动更新"></a>宽限时间(Grace periods)和滚动更新</h2><p>正常关机适用于要删除的 Pod。</p>
<p>但是,如果不删除 Pod,该怎么办?</p>
<p>即使您不这样做,Kubernetes 也会始终删除 Pod。</p>
<p>尤其是,每次部署较新版本的应用程序时,Kubernetes 都会创建和删除 Pod。</p>
<p>在部署中更改镜像时,Kubernetes 会逐步推出更改。<br><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">apiVersion:</span> apps/v1</span><br><span class="line"><span class="attr">kind:</span> Deployment</span><br><span class="line"><span class="attr">metadata:</span></span><br><span class="line"><span class="attr"> name:</span> app</span><br><span class="line"><span class="attr">spec:</span></span><br><span class="line"><span class="attr"> replicas:</span> <span class="number">3</span></span><br><span class="line"><span class="attr"> selector:</span></span><br><span class="line"><span class="attr"> matchLabels:</span></span><br><span class="line"><span class="attr"> name:</span> app</span><br><span class="line"><span class="attr"> template:</span></span><br><span class="line"><span class="attr"> metadata:</span></span><br><span class="line"><span class="attr"> labels:</span></span><br><span class="line"><span class="attr"> name:</span> app</span><br><span class="line"><span class="attr"> spec:</span></span><br><span class="line"><span class="attr"> containers:</span></span><br><span class="line"><span class="attr"> - name:</span> app</span><br><span class="line"> <span class="comment"># image: nginx:1.18 OLD</span></span><br><span class="line"><span class="attr"> image:</span> nginx:<span class="number">1.19</span></span><br><span class="line"><span class="attr"> ports:</span></span><br><span class="line"><span class="attr"> - containerPort:</span> <span class="number">3000</span></span><br></pre></td></tr></table></figure></p>
<p>如果您有三个副本,并且一旦提交新的 YAML 资源 Kubernetes,则:</p>
<ul>
<li>用新的容器镜像创建一个 Pod。</li>
<li>销毁现有的 Pod。</li>
<li>等待 Pod 准备就绪。</li>
</ul>
<p>并重复上述步骤,直到所有 Pod 都迁移到较新的版本。</p>
<p>Kubernetes 仅在新的 Pod 准备好接收流量(换句话说,它通过 Readiness 检查)之后才重复每个周期。</p>
<p>Kubernetes 是否在等待 Pod 被删除之后再移到下一个 Pod?</p>
<p><strong>并不会!!!</strong></p>
<p>如果您有 10 个 Pod,并且 Pod 需要 2 秒钟的准备时间和 20 个关闭的时间,则会发生以下情况:</p>
<p>创建第一个 Pod,并终止前一个 Pod。</p>
<p>Kubernetes 创建一个新的 Pod 之后,需要 2 秒钟的准备时间。</p>
<p>同时,被终止的 Pod 会终止 20 秒</p>
<p>20 秒后,所有新的 Pod 均已启用(10 个 Pod ,在 2 秒后就绪),并且所有之前的 10 个Pod 都将终止(第一个 Terminated Pod 将要退出)。</p>
<p>总共,您在短时间内将 Pod 的数量增加了一倍(运行 10 次,终止 10 次)。<br><img src="/images/k8s/g_sd_49.png" alt="graceful shutdown"></p>
<p>与 “Readiness” 探针相比,宽限时间(graceful period)越长,您同时具有 “Running”(和 Terminating )的 Pod 越多。</p>
<p>不好吗?</p>
<p>不一定,因为您要小心不要断开连接。</p>
<h2 id="终止长时间运行的任务"><a href="#终止长时间运行的任务" class="headerlink" title="终止长时间运行的任务"></a>终止长时间运行的任务</h2><p><em>那长期工作呢?</em></p>
<p><em>如果您要对大型视频进行转码,是否有其他方法可以延迟停止 Pod?</em></p>
<p>假设您有一个包含三个副本的 Deployment。</p>
<p>每个副本都分配了一个视频进行转码,该任务可能需要几个小时才能完成。</p>
<p>当您触发滚动更新时,Pod 会在 30 秒内完成任务,然后将其杀死。</p>
<p>如何避免延迟关闭 Pod?</p>
<p>您可以将其增加 <code>terminationGracePeriodSeconds</code> 几个小时。</p>
<p><strong>但是,此时 Pod 的 endpoint不可达。</strong><br><img src="/images/k8s/g_sd_50.png" alt="graceful shutdown"></p>
<p>如果公开指标以监视 Pod,则检测工具将无法访问 Pod。</p>
<p>为什么?</p>
<p><strong>诸如 Prometheus 之类的工具依赖于 Endpoints 来在群集中探测 Pod</strong>。</p>
<p>但是,一旦删除 Pod,endpoint 删除就会在群集中通告,甚至传播到 Prometheus!</p>
<p><strong>您应该考虑为每个新版本创建一个新的 Deployment,而不是增加宽限时间(grace period)</strong>。</p>
<p>当您创建全新的 deployment 时,现有的 deployment 将保持不变。</p>
<p>长时间运行的作业可以照常继续处理视频。</p>
<p>完成后,您可以手动删除它们。</p>
<p>如果希望自动删除它们,则可能需要设置一个弹性伸缩,当它们用尽任务时,可以将部署扩展到零个副本。</p>
<p>这种 Pod 自动伸缩的一个示例是 Osiris,<a href="https://github.com/deislabs/osiris" target="_blank" rel="external">它是 Kubernetes 的通用,从零缩放的组件</a>。</p>
<p>该技术有时被称为 <strong>Rainbow 部署</strong>,并且在每次您必须使以前的 Pod 运行超过宽限期的时间时很有用。</p>
<p><em>另一个很好的例子是 WebSockets。</em></p>
<p>如果您正在向用户流式传输实时更新,则可能不希望在每次发布时都终止 WebSocket。</p>
<p>如果您每天频繁发布,则可能会导致实时 Feed 多次中断。</p>
<p><strong>为每个版本创建一个新的 Deployment 是一个不太明显但却是更好的选择。</strong></p>
<p>现有用户可以继续流更新,而最新的 Deployment 服务于新用户。</p>
<p>当用户断开与旧 Pod 的连接时,您可以逐渐减少副本并退出旧的 Deployment。</p>
<h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>您应该注意 Pod 从集群中删除,因为它们的IP地址可能仍用于路由流量。</p>
<p>与其立即关闭 Pods,不如考虑在应用程序中等待更长的时间或设置一个 <code>preStop</code> 钩子。</p>
<p>仅在通告集群中的所有 endpoint 并将其从 kube-proxy,Ingress 控制器,CoreDNS 等中删除后,才应删除 Pod。</p>
<p>如果您的 Pod 运行诸如视频转码或使用 WebSocket 进行实时更新之类的长期任务,则应考虑使用 Rainbow 部署。</p>
<p>在 Rainbow 部署中,您为每个版本创建一个新的 Deployment,并在耗尽连接(或任务)后删除上一个版本。</p>
<p>您可以在长时间运行的任务完成后立即手动删除较旧的 Deployment。</p>
<p>或者,您可以自动将 Deployment 扩展到零副本,从而可以自动化该过程。</p>
<p><img src="/images/wx_dyh.png" alt="微信订阅号"></p>
<blockquote>
<p>原文:<a href="https://learnk8s.io/graceful-shutdown" target="_blank" rel="external">https://learnk8s.io/graceful-shutdown</a></p>
</blockquote>
</content>
<summary type="html">
<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>在本文中,您将了解如何在Pod启动或关闭时防止连接异常,并将学习如何以优雅的方式关闭长时间运行的任务。<br><img src="/images/k8s/g_shutdown_1.png" alt="graceful shutdown"></p>
</summary>
<category term="kubernetes" scheme="http://team.jiunile.com/categories/kubernetes/"/>
<category term="容器" scheme="http://team.jiunile.com/categories/kubernetes/%E5%AE%B9%E5%99%A8/"/>
<category term="k8s" scheme="http://team.jiunile.com/tags/k8s/"/>
<category term="容器" scheme="http://team.jiunile.com/tags/%E5%AE%B9%E5%99%A8/"/>
<category term="部署" scheme="http://team.jiunile.com/tags/%E9%83%A8%E7%BD%B2/"/>
</entry>
<entry>
<title>图文带你了解 Go 中的分配</title>
<link href="http://team.jiunile.com//blog/2020/12/go-allocations.html"/>
<id>http://team.jiunile.com//blog/2020/12/go-allocations.html</id>
<published>2020-12-21T14:00:00.000Z</published>
<updated>2020-12-22T03:35:11.000Z</updated>
<content type="html"><h2 id="介绍"><a href="#介绍" class="headerlink" title="介绍"></a>介绍</h2><p>得益于了 Go 运行时高效的内置内存管理,我们通常能够在程序中优先考虑正确性和可维护性,而不需要过多考虑如何进行分配的细节。不过,有时我们可能会发现代码中的性能瓶颈,并希望进行更深入的研究。</p>
<p>任何使用 <code>-benchmem</code> 标志运行基准测试的人都会在输出中看到 <code>allocs/op</code> 的统计。在这篇文章中,我们将看看什么算作一个 alloc,以及我们可以做什么来影响这个数字。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">BenchmarkFunc-8 67836464 16.0 ns/op 8 B/op 1 allocs/op</span><br></pre></td></tr></table></figure></p>
<a id="more"></a>
<h2 id="我们熟悉和喜爱的栈和堆"><a href="#我们熟悉和喜爱的栈和堆" class="headerlink" title="我们熟悉和喜爱的栈和堆"></a>我们熟悉和喜爱的栈和堆</h2><p>要讨论 Go 中的 <code>allocs/op</code> 统计,我们将对 Go 程序中的两个内存区域感兴趣:栈和堆。</p>
<p>在许多流行的编程环境中,栈通常指的是线程的调用栈。调用栈是一个先进先出(LIFO)栈数据结构,它存储了线程执行函数时跟踪的参数、局部变量和其他数据。每一次函数调用都向栈增加(推)一个新的帧,每一次返回函数都会从栈中删除(弹出)。</p>
<p>我们必须能够在最近的栈帧被弹出时安全地释放它的内存。因此,我们不能在栈上存储任何以后需要在其他地方引用的东西。</p>
<p><img src="/images/go/allocs_1.png" alt="调用 println 后的调用栈视图"></p>
<p>由于线程是由操作系统管理的,所以线程栈的可用内存量通常是固定的,例如在许多 Linux 环境中默认为 8MB。这意味着我们还需要注意栈上最终有多少数据,特别是在嵌套较深的递归函数的情况下。如果上图中的栈指针通过了栈保护,程序就会因栈溢出错误而崩溃。</p>
<p>堆是内存中更复杂的区域,与同名的数据结构没有关系。我们可以按需使用堆来存储程序中需要的数据。在这里分配的内存不能在函数返回时简单地释放,需要仔细管理,以避免泄漏和碎片化。堆通常会比任何线程栈大许多倍,任何优化工作的大部分时间都将花费在研究堆的使用上。</p>
<h2 id="Go-栈和堆"><a href="#Go-栈和堆" class="headerlink" title="Go 栈和堆"></a>Go 栈和堆</h2><p>由操作系统管理的线程被 Go 运行时完全抽象出来,我们使用的是一个新的抽象:goroutines。goroutine 在概念上与线程非常相似,但它们存在于用户空间中。这意味着是运行时而不是操作系统来设置栈的行为规则。</p>
<p><img src="/images/go/allocs_2.png" alt="线程被抽离出来"></p>
<p>goroutine 栈并不是由操作系统设置的硬性限制,而是以少量的内存(目前为 2KB)开始。在执行每个函数调用之前,在函数序言中会执行检查,以验证不会发生栈溢出。在下面的图中,<code>convert()</code> 函数可以在当前栈大小的限制下执行(在 SP 不超额处理 <code>stackguard0</code> 的情况下)。</p>
<p><img src="/images/go/allocs_3.png" alt="goroutine 调用栈特写"></p>
<p>如果不是这样,运行时将在执行 <code>convert()</code> 之前将当前栈复制到一个更大的连续内存空间中。这意味着 Go 中的栈是动态大小的,只要有足够的内存可用,通常就可以保持增长。</p>
<p>Go 堆在概念上同样类似于上面描述的线程模型。所有的 goroutines 共享一个公共堆,任何不能存储在栈上的东西都将在那里结束。当对函数进行基准测试时发生堆分配时,我们将看到<code>allocs/ops</code> 属性增加 1。垃圾回收器的工作是稍后释放不再引用的堆变量。</p>
<p>关于Go中如何处理内存管理的详细解释,请参阅 <a href="https://medium.com/@ankur_anand/a-visual-guide-to-golang-memory-allocator-from-ground-up-e132258453ed" target="_blank" rel="external">从头开始的 Go 内存分配器的可视化指南</a>。</p>
<h2 id="我们如何知道一个变量何时被分配给堆?"><a href="#我们如何知道一个变量何时被分配给堆?" class="headerlink" title="我们如何知道一个变量何时被分配给堆?"></a>我们如何知道一个变量何时被分配给堆?</h2><p>这个问题答案在官方 FAQ 中。</p>
<blockquote>
<p>Go 编译器将为函数的栈帧中分配该函数的局部变量。但如果编译器不能证明该变量在函数返回后没有被引用,那么编译器必须在垃圾回收的堆上分配变量,以避免指针悬空错误。而且,如果局部变量非常大,那么将它存储在堆上而不是栈上可能更有意义。</p>
<p>如果某个变量的地址已被占用,那么该变量将成为堆上分配的候选变量。然而,一个基本的转义分析可以识别出一些情况,即这样的变量不会活过函数的返回,可以驻留在栈中。</p>
</blockquote>
<p>由于编译器的实现会随着时间的推移而改变,<strong>所以仅仅通过阅读 Go 代码,是无法知道哪些变量会被分配到堆中的</strong>。不过,可以在编译器的输出中查看上面提到的 escape 分析结果。这可以通过传递给 <code>go build</code> 的 <code>gcflags</code> 参数来实现。完整的选项列表可以通过 <code>go tool compile -help</code> 来查看。</p>
<p>对于转义分析结果,可以使用 <code>-m</code> 选项(<code>打印优化决策</code>)。让我们用一个简单的程序来测试一下,为函数 <code>main1</code> 和 <code>stackIt</code> 创建两个栈帧。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main1</span><span class="params">()</span></span> &#123;</span><br><span class="line"> _ = stackIt()</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">//go:noinline</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">stackIt</span><span class="params">()</span> <span class="title">int</span></span> &#123;</span><br><span class="line"> y := <span class="number">2</span></span><br><span class="line"> <span class="keyword">return</span> y * <span class="number">2</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>因为如果编译器删除了函数调用,我们就无法讨论栈行为,所以在编译代码时使用 <code>noinline</code> <a href="https://dave.cheney.net/2018/01/08/gos-hidden-pragmas" target="_blank" rel="external">pragma</a> 来防止内联。让我们看一下编译器对其优化决策说些什么。<code>-l</code> 选项用于省略内联决策。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ go build -gcflags <span class="string">'-m -l'</span></span><br><span class="line"><span class="comment"># github.com/Jimeux/go-samples/allocations</span></span><br></pre></td></tr></table></figure></p>
<p>在这里,我们看到,没有做出任何关于逃跑分析的决定。换句话说,变量 <code>y</code> 保留在栈中,并没有触发任何堆分配。我们可以用基准测试来验证这一点。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ go <span class="built_in">test</span> -bench . -benchmem</span><br><span class="line">BenchmarkStackIt-8 680439016 1.52 ns/op 0 B/op 0 allocs/op</span><br></pre></td></tr></table></figure></p>
<p>正如预期的那样,<code>allocs/op</code> 统计值为 <code>0</code>。从这个结果中我们可以得到的一个重要观察是,<strong>复制变量可以让我们将它们保留在栈中</strong>,避免分配到堆中。让我们通过修改程序来验证这一点,以避免使用指针进行复制。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main2</span><span class="params">()</span></span> &#123;</span><br><span class="line"> _ = stackIt2()</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">//go:noinline</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">stackIt2</span><span class="params">()</span> *<span class="title">int</span></span> &#123;</span><br><span class="line"> y := <span class="number">2</span></span><br><span class="line"> res := y * <span class="number">2</span></span><br><span class="line"> <span class="keyword">return</span> &amp;res</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>让我们看以下编译器的输出。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">go build -gcflags <span class="string">'-m -l'</span></span><br><span class="line"><span class="comment"># github.com/Jimeux/go-samples/allocations</span></span><br><span class="line">./main.go:10:2: moved to heap: res</span><br></pre></td></tr></table></figure></p>
<p>编译器告诉我们,它把指针 <code>res</code> 移到了堆上,从而触发了堆分配,这在下面的基准中得到了验证。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ go <span class="built_in">test</span> -bench . -benchmem</span><br><span class="line">BenchmarkStackIt2-8 70922517 16.0 ns/op 8 B/op 1 allocs/op</span><br></pre></td></tr></table></figure></p>
<p>那么这是否意味着指针一定会创建分配?让我们再次修改程序,这次将指针传到栈下。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main3</span><span class="params">()</span></span> &#123;</span><br><span class="line"> y := <span class="number">2</span></span><br><span class="line"> _ = stackIt3(&amp;y) <span class="comment">// pass y down the stack as a pointer</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">//go:noinline</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">stackIt3</span><span class="params">(y *<span class="keyword">int</span>)</span> <span class="title">int</span></span> &#123;</span><br><span class="line"> res := *y * <span class="number">2</span></span><br><span class="line"> <span class="keyword">return</span> res</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>然而运行基准测试显示没有任何东西被分配到堆中。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ go <span class="built_in">test</span> -bench . -benchmem</span><br><span class="line">BenchmarkStackIt3-8 705347884 1.62 ns/op 0 B/op 0 allocs/op</span><br></pre></td></tr></table></figure></p>
<p>编译器的输出明确地告诉我们这一点。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">$ go build -gcflags <span class="string">'-m -l'</span></span><br><span class="line"><span class="comment"># github.com/Jimeux/go-samples/allocations</span></span><br><span class="line">./main.go:10:14: y does not escape</span><br></pre></td></tr></table></figure></p>
<p>为什么会出现这种看似不一致的情况呢?<code>stackIt2</code> 将 <code>y</code> 的地址从栈上传递到 <code>main</code>,在 <code>main</code> 中,<code>y</code> 将在 <code>stackIt2</code> 的栈帧被释放后被引用。因此,编译器能够判断 <code>y</code> 必须被移到堆上才能保持活力。如果它不这样做,当我们试图引用 <code>y</code> 时,就会在 <code>main</code> 中得到一个 <code>nil</code> 指针。</p>
<p>而 <code>stackIt3</code> 则是将 <code>y</code> 传到栈下,而且 <code>y</code> 在 <code>main3</code> 之外的任何地方都不会被引用。因此,编译器能够判断 <code>y</code> 可以单独存在于栈中,而不需要分配到堆中。在任何情况下,我们都无法通过引用 <code>y</code> 来产生一个 <code>nil</code> 指针。</p>
<p><strong>从这里我们可以推断出一个通用规则,即在栈上共享指针会导致分配,而共享栈下的指针则不会</strong>。但是,这并不能保证,所以您仍然需要使用 <code>gcflags</code> 或基准来验证。我们可以肯定的是,任何试图减少 <code>allocs/op</code> 的尝试都将涉及到寻找任性的指针。</p>
<h2 id="我们为什么要关心堆分配?"><a href="#我们为什么要关心堆分配?" class="headerlink" title="我们为什么要关心堆分配?"></a>我们为什么要关心堆分配?</h2><p>我们已经了解了一些关于 <code>allocs/op</code> 中的 <code>alloc</code> 的含义,以及如何验证是否触发了对堆的分配,但是为什么我们要关心这个统计是否是非零呢?我们已经做的基准测试可以回答这个问题。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">BenchmarkStackIt-8 680439016 1.52 ns/op 0 B/op 0 allocs/op</span><br><span class="line">BenchmarkStackIt2-8 70922517 16.0 ns/op 8 B/op 1 allocs/op</span><br><span class="line">BenchmarkStackIt3-8 705347884 1.62 ns/op 0 B/op 0 allocs/op</span><br></pre></td></tr></table></figure></p>
<p>尽管所涉及的变量对内存的需求几乎相等,但相对而言,<code>BenchmarkStackIt2</code> 对 CPU 的开销还是很明显的。我们可以通过生成 <code>stackIt</code> 和 <code>stackIt2</code> 生成的 CPU 曲线的火焰图来了解更多的情况。<br><img src="/images/go/allocs_4.png" alt="stackIt CPU profile"></p>
<p><img src="/images/go/allocs_5.png" alt="stackIt2 CPU profile"></p>
<p><code>stackIt</code> 有一个不起眼的配置文件,它可以预见地从调用栈运行到 <code>stackIt</code> 函数本身。另一方面,<code>stackIt2</code> 大量使用了大量的运行时函数,这些函数消耗了许多额外的 CPU 周期。这说明了分配到堆所涉及的复杂性,并初步了解了每个操作额外的 10 纳秒左右的去向。</p>
<h2 id="那在现实世界中呢?"><a href="#那在现实世界中呢?" class="headerlink" title="那在现实世界中呢?"></a>那在现实世界中呢?</h2><p>如果没有生产条件,性能的许多方面不会变得明显。你的单个功能可能在微基准测试中高效运行,但当它为成千上万的并发用户服务时,它会对你的应用程序有什么影响呢?</p>
<p>我们不会在这篇文章中重新创建一个完整的应用程序,但我们将使用<a href="https://golang.org/cmd/trace/" target="_blank" rel="external">跟踪工具</a>来看看一些更详细的性能诊断。让我们首先定义一个(有点)大的结构体,它有 9 个字段。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> BigStruct <span class="keyword">struct</span> &#123;</span><br><span class="line"> A, B, C <span class="keyword">int</span></span><br><span class="line"> D, E, F <span class="keyword">string</span></span><br><span class="line"> G, H, I <span class="keyword">bool</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>现在我们来定义两个函数:<code>CreateCopy</code>,它在栈帧之间复制 <code>BigStruct</code> 实例;<code>CreatePointer</code>,它在栈上共享 <code>BigStruct</code> 指针,避免复制,但会产生堆分配。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//go:noinline</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreateCopy</span><span class="params">()</span> <span class="title">BigStruct</span></span> &#123;</span><br><span class="line"> <span class="keyword">return</span> BigStruct&#123;</span><br><span class="line"> A: <span class="number">123</span>, B: <span class="number">456</span>, C: <span class="number">789</span>,</span><br><span class="line"> D: <span class="string">"ABC"</span>, E: <span class="string">"DEF"</span>, F: <span class="string">"HIJ"</span>,</span><br><span class="line"> G: <span class="literal">true</span>, H: <span class="literal">true</span>, I: <span class="literal">true</span>,</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">//go:noinline</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreatePointer</span><span class="params">()</span> *<span class="title">BigStruct</span></span> &#123;</span><br><span class="line"> <span class="keyword">return</span> &amp;BigStruct&#123;</span><br><span class="line"> A: <span class="number">123</span>, B: <span class="number">456</span>, C: <span class="number">789</span>,</span><br><span class="line"> D: <span class="string">"ABC"</span>, E: <span class="string">"DEF"</span>, F: <span class="string">"HIJ"</span>,</span><br><span class="line"> G: <span class="literal">true</span>, H: <span class="literal">true</span>, I: <span class="literal">true</span>,</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>我们可以用目前使用的技术来验证上面的解释。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ go build -gcflags <span class="string">'-m -l'</span></span><br><span class="line">./main.go:67:9: &amp;BigStruct literal escapes to heap</span><br><span class="line"></span><br><span class="line">$ go <span class="built_in">test</span> -bench . -benchmem</span><br><span class="line">BenchmarkCopyIt-8 211907048 5.20 ns/op 0 B/op 0 allocs/op</span><br><span class="line">BenchmarkPointerIt-8 20393278 52.6 ns/op 80 B/op 1 allocs/op</span><br></pre></td></tr></table></figure></p>
<p>以下是我们将用于跟踪工具的测试。它们分别用各自的 <code>Create</code> 函数创建 20,000,000 个 <code>BigStruct</code> 实例。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> creations = <span class="number">20</span>_000_000</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">TestCopyIt</span><span class="params">(t *testing.T)</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; creations; i++ &#123;</span><br><span class="line"> _ = CreateCopy()</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">TestPointerIt</span><span class="params">(t *testing.T)</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; creations; i++ &#123;</span><br><span class="line"> _ = CreatePointer()</span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>接下来,我们将把 <code>CreateCopy</code> 的跟踪输出保存到文件 <code>copy_trace.out</code> 中。并在浏览器中用跟踪工具打开它。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ go <span class="built_in">test</span> -run TestCopyIt -trace=copy_trace.out</span><br><span class="line">PASS</span><br><span class="line">ok github.com/Jimeux/go-samples/allocations 0.281s</span><br><span class="line"></span><br><span class="line">$ go tool trace copy_trace.out</span><br><span class="line">Parsing trace...</span><br><span class="line">Splitting trace...</span><br><span class="line">Opening browser. Trace viewer is listening on http://127.0.0.1:57530</span><br></pre></td></tr></table></figure></p>
<p>从菜单中选择 <code>View trace</code>,我们看到了下面的画面,它几乎和我们的 <code>stackIt</code> 功能的火焰图一样不引人注目。8 个潜在的逻辑核(Procs)中只有 2 个被使用,<code>goroutine G19</code> 几乎花费整个时间运行我们的测试循环–这正是我们想要的。<br><img src="/images/go/allocs_6.png" alt="Trace for 20,000,000 CreateCopy calls"></p>
<p>让我们为 <code>CreatePointer</code> 代码生成跟踪数据。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ go <span class="built_in">test</span> -run TestPointerIt -trace=pointer_trace.out</span><br><span class="line">PASS</span><br><span class="line">ok github.com/Jimeux/go-samples/allocations 2.224s</span><br><span class="line"></span><br><span class="line">go tool trace pointer_trace.out</span><br><span class="line">Parsing trace...</span><br><span class="line">Splitting trace...</span><br><span class="line">Opening browser. Trace viewer is listening on http://127.0.0.1:57784</span><br></pre></td></tr></table></figure></p>
<p>您可能已经注意到,与 <code>CreateCopy</code> 的 0.281 秒相比,测试花费了 2.224 秒,选择 <code>View trace</code> 这次显示的内容更加丰富多彩,更加繁忙。所有的逻辑内核都被利用了,堆操作、线程和 goroutines 似乎比上次多了很多。<br><img src="/images/go/allocs_7.png" alt="Trace for 20,000,000 CreatePointer calls"></p>
<p>如果我们把跟踪的时间放大到一毫秒左右的跨度,我们会看到很多 goroutine 在执行与垃圾回收相关的操作。前面引用的 FAQ 中使用了“垃圾回收堆”这个词,因为垃圾回收器的工作就是清理堆上任何不再被引用的东西。<br><img src="/images/go/allocs_8.png" alt="在CreatePointer跟踪中的GC活动特写"></p>
<p>尽管 Go 的垃圾回收器效率越来越高,但这个过程并不是免费的。我们可以从上面的跟踪输出中直观地验证,测试代码有时完全停止了。对于 <code>CreateCopy</code> 来说,情况并非如此,因为我们所有的 <code>BigStruct</code> 实例仍然在栈上,GC 几乎没有什么事情可做。</p>
<p>比较两组跟踪数据中的 goroutine 分析可以更深入地了解这一点。<code>CreatePointer</code>(底部)花费了超过 15% 的执行时间来清扫或暂停(GC)和调度 goroutines。<br><img src="/images/go/allocs_9.png" alt="CreateCopy 的顶层 goroutine 分析"></p>
<p><img src="/images/go/allocs_10.png" alt="CreatePointer 的顶层 goroutine 分析"></p>
<p>看看跟踪数据中其他地方的一些统计数据,可以进一步说明堆分配的成本,生成的 goroutine数量有明显的差异,<code>CreatePointer</code> 测试有近 400 个 STW(停止世界)事件。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">+------------+------+---------+</span><br><span class="line">| | Copy | Pointer |</span><br><span class="line">+------------+------+---------+</span><br><span class="line">| Goroutines | 41 | 406965 |</span><br><span class="line">| Heap | 10 | 197549 |</span><br><span class="line">| Threads | 15 | 12943 |</span><br><span class="line">| bgsweep | 0 | 193094 |</span><br><span class="line">| STW | 0 | 397 |</span><br><span class="line">+------------+------+---------+</span><br></pre></td></tr></table></figure></p>
<p>但请记住,尽管本节的标题是这样的,但 CreateCopy 测试的条件在一个典型的程序中是非常不现实的。GC 使用一致数量的 CPU 是很正常的,指针是任何真实程序的一个特征。然而,这和前面的火焰图一起给了我们一些启示,为什么我们可能要跟踪 allocs/op 统计,并尽可能避免不必要的堆分配。</p>
<h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>希望这篇文章能让大家了解到 Go 程序中栈和堆之间的区别、<code>allocs/op</code> 统计的意义,以及我们可以调研内存使用情况的一些方法。</p>
<p>代码的正确性和可维护性通常比减少指针使用和规避 GC 活动的技巧更重要。到目前为止,每个人都知道关于过早优化的那条线,在 Go 中编写代码也不例外。</p>
<p>然而,如果我们确实有严格的性能要求或在其他方面确定了程序中的瓶颈,这里介绍的概念和工具有望成为进行必要优化的有用起点。</p>
<p>如果你想玩玩这篇文章中的简单代码示例,请查看 <a href="https://github.com/Jimeux/go-samples/tree/master/allocations" target="_blank" rel="external">GitHub</a> 上的源代码和 README。</p>
<p><img src="/images/wx_dyh.png" alt="微信订阅号"></p>
<blockquote>
<p>译自:<a href="https://medium.com/eureka-engineering/understanding-allocations-in-go-stack-heap-memory-9a2631b5035d" target="_blank" rel="external">https://medium.com/eureka-engineering/understanding-allocations-in-go-stack-heap-memory-9a2631b5035d</a></p>
</blockquote>
</content>
<summary type="html">
<h2 id="介绍"><a href="#介绍" class="headerlink" title="介绍"></a>介绍</h2><p>得益于了 Go 运行时高效的内置内存管理,我们通常能够在程序中优先考虑正确性和可维护性,而不需要过多考虑如何进行分配的细节。不过,有时我们可能会发现代码中的性能瓶颈,并希望进行更深入的研究。</p>
<p>任何使用 <code>-benchmem</code> 标志运行基准测试的人都会在输出中看到 <code>allocs/op</code> 的统计。在这篇文章中,我们将看看什么算作一个 alloc,以及我们可以做什么来影响这个数字。<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">BenchmarkFunc-8 67836464 16.0 ns/op 8 B/op 1 allocs/op</span><br></pre></td></tr></table></figure></p>
</summary>
<category term="Golang" scheme="http://team.jiunile.com/categories/Golang/"/>
<category term="Allocations" scheme="http://team.jiunile.com/categories/Golang/Allocations/"/>
<category term="go" scheme="http://team.jiunile.com/tags/go/"/>
<category term="allocations" scheme="http://team.jiunile.com/tags/allocations/"/>
</entry>
<entry>
<title>如何在 Go 中编写无 Bug 的 Goroutines?</title>
<link href="http://team.jiunile.com//blog/2020/12/go-nobug-gorotuine.html"/>
<id>http://team.jiunile.com//blog/2020/12/go-nobug-gorotuine.html</id>
<published>2020-12-15T14:00:00.000Z</published>
<updated>2020-12-15T13:27:02.000Z</updated>
<content type="html"><h2 id="GO-并发"><a href="#GO-并发" class="headerlink" title="GO 并发"></a>GO 并发</h2><p>Go 以其并发性著称,深受人们喜爱。go 运行时管理轻量级线程,称为 goroutines。goroutine 的编写非常快速简单。</p>
<p>你只需在你想异步执行的函数前输入 <code>go</code>,程序就会在另一个线程中执行。</p>
<p><strong>听起来很简单?</strong></p>
<p>goroutines 是 Go 编写异步代码的方式。</p>
<p>重要的是要了解 goroutine 和并发的工作原理。Go 提供了管理 goroutine 的方法,使它们在复杂的程序中更容易管理和预测。</p>
<blockquote>
<p>因为 goroutine 非常容易使用,所以它们很容易被滥用。</p>
</blockquote>
<a id="more"></a>
<h2 id="1-在异步例程中不要对执行顺序进行假设。"><a href="#1-在异步例程中不要对执行顺序进行假设。" class="headerlink" title="1 在异步例程中不要对执行顺序进行假设。"></a>1 在异步例程中不要对执行顺序进行假设。</h2><p>在 Go 中调度并发任务时,要记住异步任务的不可预知性。</p>
<p>可以将异步与同步计算融合在一起,但只要同步任务不对异步任务做任何假设即可。</p>
<p>对于初学者来说,一个常见的错误是创建一个 goroutine,然后根据该 goroutine 的结果继续执行同步任务。例如,如果该 goroutine 要向其作用域外的变量写入,然后在同步任务中使用该变量。</p>
<h3 id="假设执行顺序"><a href="#假设执行顺序" class="headerlink" title="假设执行顺序"></a>假设执行顺序</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"time"</span></span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> numbers []<span class="keyword">int</span> <span class="comment">// nil</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// start a goroutine to initialise array</span></span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span> <span class="params">()</span></span> &#123;</span><br><span class="line"> numbers = <span class="built_in">make</span>([]<span class="keyword">int</span>, <span class="number">2</span>)</span><br><span class="line"> &#125;()</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// do something synchronous</span></span><br><span class="line"> <span class="keyword">if</span> numbers == <span class="literal">nil</span> &#123;</span><br><span class="line"> time.Sleep(time.Second)</span><br><span class="line"> &#125;</span><br><span class="line"> numbers[<span class="number">0</span>] = <span class="number">1</span> <span class="comment">// will sometimes panic here</span></span><br><span class="line"> fmt.Println(numbers[<span class="number">0</span>])</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure>
<p><strong>这种模式会导致不可预知的行为</strong>。它引入的代码导致了我们无法控制的因素;这些因素与 go 运行时有关,更具体地说,就是它如何管理 goroutines。</p>
<blockquote>
<p>编写这样的代码意味着假定 goroutine 将在需要结果之前完成它的任务。</p>
</blockquote>
<p><strong>首先</strong>,在没有某种管理技术(我们将讨论)的情况下,交叉异步和同步代码的成功将取决于 CPU 的可用性。</p>
<p>这意味着如果有 CPU 密集型的进程与 goroutines 同时运行,那么执行的时间将会有所不同。</p>
<p><strong>其次</strong>,不同的编译器将以不同的方式调度 goroutines。因此,安全的做法是不要认为 goroutine 会在同步任务期间完成。</p>
<p><strong>如何确保 goroutine 已经完成?</strong></p>
<blockquote>
<p>使用 channel</p>
</blockquote>
<p><strong>在异步任务完成时使用 channel 来通知</strong></p>
<p>channel 应该用于接收来自异步任务(如 goroutines)的值。</p>
<p>如果你想阻止进一步的执行,直到最终从 channel 读取一个值来释放它,可以使用缓冲通道。</p>
<p>如果你想要 1 进 1 出的行为,那么使用非缓冲通道。</p>
<p>在本例中,使用 channel,我们可以确保主任务等待直到异步任务完成。当 goroutine 完成它的工作时,它将通过 <code>done channel</code> 发送一个值,该值将在对 <code>numbers</code> 数组进行操作之前被读取。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"time"</span></span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> numbers []<span class="keyword">int</span> <span class="comment">// nil</span></span><br><span class="line"> done := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line"> <span class="comment">// start a goroutine to initialise array</span></span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span> <span class="params">()</span></span> &#123;</span><br><span class="line"> numbers = <span class="built_in">make</span>([]<span class="keyword">int</span>, <span class="number">2</span>)</span><br><span class="line"> done &lt;- <span class="keyword">struct</span>&#123;&#125;&#123;&#125;</span><br><span class="line"> &#125;()</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// do something synchronous</span></span><br><span class="line"> &lt;-done <span class="comment">// read done from channel</span></span><br><span class="line"> numbers[<span class="number">0</span>] = <span class="number">1</span> <span class="comment">// will not panic anymore</span></span><br><span class="line"> fmt.Println(numbers[<span class="number">0</span>]) <span class="comment">// 1</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>尽管这是一个人为的示例,但你可以看到它在什么地方会很有用:当主线程与 goroutine 并行处理复杂工作时。这两个任务可以同时完成,而不可能出现 <code>panic</code>。</p>
<h2 id="2-避免跨并发线程访问可变数据"><a href="#2-避免跨并发线程访问可变数据" class="headerlink" title="2 避免跨并发线程访问可变数据"></a>2 避免跨并发线程访问可变数据</h2><p>跨多个 goroutine 访问可变数据是将数据竞争引入程序的“好方法”。</p>
<p>数据竞争是指两个或多个线程(或这里的goroutine)<strong>并发访问同一内存位置</strong>。</p>
<p>这意味着跨线程访问相同的变量可能会产生不可预测的值。如果两个进程同时访问同一个变量,有两种可能性:</p>
<ul>
<li>两个线程的值是相同的(<strong>不正确</strong>)。</li>
<li>对于较慢/较晚的线程,该值是不同的。(<strong>正确</strong>)</li>
</ul>
<p>如果较慢/较晚的线程读取了一个已被较快/较早的线程修改过的更新值,那么它将对更新后的值进行操作。<strong>这是预期的行为</strong>。</p>
<p>否则,就像在数据竞争中看到的那样,两个线程将产生相同的值,因为它们都将对未更改的值进行操作。</p>
<p><strong> 1000 种可能的数据竞争</strong></p>
<p>在这个例子中,我们使用 <code>sync.WaitGroup</code> 来保持程序运行,直到所有的 goroutine 完成,但我们并没有控制对每个 goroutine 内变量的访问。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"sync"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> a := <span class="number">0</span> <span class="comment">// data race</span></span><br><span class="line"> <span class="keyword">var</span> wg sync.WaitGroup</span><br><span class="line"> wg.Add(<span class="number">1000</span>)</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">1000</span>; i++ &#123;</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">defer</span> wg.Done()</span><br><span class="line"> a += <span class="number">1</span></span><br><span class="line"> &#125;()</span><br><span class="line"> &#125;</span><br><span class="line"> wg.Wait()</span><br><span class="line"> fmt.Println(a) <span class="comment">// could theoretical be any number 0-1000 (most likely above 900)</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>这段代码可以打印 0-1000 之间的任何数字,具体取决于发生的数据竞争数量。</p>
<p>这段代码的工作原理是,两个线程将对同一个变量各执行 2 次操作,总共有 2 次读 + 2 次写。</p>
<blockquote>
<p>在两个线程都会产生相同的值的情况下,在对变量进行任何写入之前,两个(2)读都必须发生。</p>
</blockquote>
<p><strong>使用互斥锁在 goroutines 之间共享内存</strong></p>
<p>为了防止 goroutines 中的数据竞争,我们需要同步对共享内存的访问。我们可以使用互斥来实现这一点。互斥锁将确保我们不会在同一时间读取或写入相同的值。</p>
<p>它本质上是<strong>暂时锁定对一个变量的访问</strong>。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> main</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> (</span><br><span class="line"> <span class="string">"fmt"</span></span><br><span class="line"> <span class="string">"sync"</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> a := <span class="number">0</span></span><br><span class="line"> <span class="keyword">var</span> wg sync.WaitGroup</span><br><span class="line"></span><br><span class="line"> <span class="keyword">var</span> mu sync.Mutex <span class="comment">// guards access</span></span><br><span class="line"></span><br><span class="line"> wg.Add(<span class="number">1000</span>)</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">1000</span>; i++ &#123;</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> mu.Lock()</span><br><span class="line"> <span class="keyword">defer</span> mu.Unlock()</span><br><span class="line"> <span class="keyword">defer</span> wg.Done()</span><br><span class="line"> a += <span class="number">1</span></span><br><span class="line"> &#125;()</span><br><span class="line"> &#125;</span><br><span class="line"> wg.Wait()</span><br><span class="line"> fmt.Println(a) <span class="comment">// will always be 1000</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>就这么简单。</p>
<p>这段代码总是打印 1000,因为对同一个变量的每个后续操作都会对更新后的值进行操作。</p>
<h2 id="3-不要写应该同步的异步任务"><a href="#3-不要写应该同步的异步任务" class="headerlink" title="3 不要写应该同步的异步任务"></a>3 不要写应该同步的异步任务</h2><p>Goroutines 通常被认为是后台任务。它们被视为可以与主程序同时运行的小任务,通过 goroutine 将其委托给另一个线程。</p>
<p>当学习 Go 时,你往往会想到使用 goroutine 来尽量减少阻塞操作,或者让我们的程序性能更强。</p>
<p>但由于对 goroutine 的看法如此简单,很容易养成 “以防万一” 的习惯,把所有东西都做成 goroutine。</p>
<p><strong>如果某些任务本质上是同步的,但你却异步地使用了它们,这就会造成问题</strong>。</p>
<p><strong>并非所有的任务都应该是一个 goroutine</strong>。</p>
<p>有些任务需要秩序。在许多进程中,下一个任务取决于前一个任务的结果。这些顺序性的任务会让你的程序出错,势必需要让这些区域更加同步。</p>
<p>所以有些情况下,你还不如直接忘掉goroutine,一开始就保持同步。</p>
<p><strong>用无限循环浪费 CPU</strong></p>
<p>在这个精心设计的示例中,我们有一个程序,它将所有内容委托给 goroutines,并使用 for 循环来保持程序运行。</p>
<p>这是一个如何不控制 Go 程序流程的例子。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">go</span> doSomething()</span><br><span class="line"> <span class="keyword">go</span> doSomethingElse()</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// execute everything as a goroutine</span></span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span> &#123; <span class="comment">// this keeps the program running</span></span><br><span class="line"> </span><br><span class="line"> &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>最好保持简单。你可以通过把你的程序看作是主线程加上附加线程的方式来防止这种类型的不良做法。你可以让主线程以同步的方式运行,但如果需要,可以通过 goroutines 将任务委托给另一个线程。</p>
<p>有更好的方法可以控制程序的流程,<strong>比如通过 <a href="https://gobyexample.com/waitgroups" target="_blank" rel="external">WaitGroups</a> 或 <a href="https://gobyexample.com/channels" target="_blank" rel="external">Channels</a></strong>。</p>
<p><strong>使用 WaitGroup 的控制流程</strong></p>
<p>与其浪费宝贵的 CPU 资源,不如使用 WaitGroup 向运行时表明,在程序退出之前,你正在等待 n 个任务的完成。这样就不会让 CPU 一直在无限循环中旋转。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">doSomething</span><span class="params">(wg *sync.WaitGroup)</span></span> &#123;</span><br><span class="line"> <span class="comment">// do something here</span></span><br><span class="line"> fmt.Println(<span class="string">"Done"</span>)</span><br><span class="line"> <span class="keyword">defer</span> wg.Done()</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">var</span> wg sync.WaitGroup</span><br><span class="line"> wg.Add(<span class="number">1</span>)</span><br><span class="line"> <span class="keyword">defer</span> wg.Wait()</span><br><span class="line"> <span class="keyword">go</span> doSomething(&amp;wg)</span><br><span class="line"> <span class="keyword">go</span> doSomethingElseSync()</span><br><span class="line"> </span><br><span class="line"> <span class="comment">// program will wait until doSomething &amp; doSomethingElseSync is complete</span></span><br><span class="line"></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>首先,您需要将等待完成的任务数量作为参数提供给 <code>wg.Add()</code> 函数。</p>
<p>放置 <code>wg.Wait()</code> 很重要。这是程序中执行将暂停的地方,等待所有任务完成。</p>
<p>一旦任务完成,您可以使用 <code>wg.Done()</code> 让程序知道。</p>
<h2 id="4-不要让-goroutines-挂起"><a href="#4-不要让-goroutines-挂起" class="headerlink" title="4 不要让 goroutines 挂起"></a>4 不要让 goroutines 挂起</h2><p>确保处理不再使用的 goroutines。持续运行的 Goroutines 将会阻塞并浪费宝贵的 CPU 资源。</p>
<p>如果 goroutine 试图将值发送到没有任何读取并等待接收值的 channel,就会发生这种情况。这就意味着这条 channel 将永远卡在那里。</p>
<p>9 个挂起的 goroutine</p>
<p>在这个例子中,channel 只被读取一次。这意味着 9 个 goroutines 在等待通过 channel 发送一个值。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">sendToChan</span><span class="params">()</span> <span class="title">int</span></span> &#123;</span><br><span class="line"> channel := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">int</span>)</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">10</span>; i++ &#123;</span><br><span class="line"> i := i</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> channel &lt;- i <span class="comment">// 9 hanging goroutines</span></span><br><span class="line"> &#125;()</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> &lt;-channel</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>为了避免这种情况,请处理不再需要的 goroutines 来释放 CPU。</p>
<p><strong>使通道缓冲</strong></p>
<p>使用缓冲通道意味着您正在为通道提供空间来存储附加值。</p>
<p>对于当前的示例,这意味着所有的 goroutines 都将成功执行,不会阻塞。<br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">sendToChan</span><span class="params">()</span> <span class="title">int</span></span> &#123;</span><br><span class="line"> channel := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">int</span>, <span class="number">9</span>)</span><br><span class="line"> <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; <span class="number">10</span>; i++ &#123;</span><br><span class="line"> i := i</span><br><span class="line"> <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> channel &lt;- i <span class="comment">// all goroutines executed successfully</span></span><br><span class="line"> &#125;()</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> &lt;-channel</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p><strong>不要在不知道什么时候停止的情况下开始一个 goroutine。</strong></p>
<p>在不知道何时停止的情况下启动一个 goroutine 会导致以下行为,即 goroutine 被阻塞或浪费 CPU 资源。</p>
<p>您应该总是知道什么时候 goroutine 将停止,什么时候不再需要它。</p>
<p><strong>您可以通过 <code>select</code> 语句和 <code>channel</code> 来实现这一点</strong><br><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">done := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">bool</span>)</span><br><span class="line"><span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line"> <span class="keyword">for</span> &#123;</span><br><span class="line"> <span class="keyword">select</span> &#123;</span><br><span class="line"> <span class="keyword">case</span> &lt;-done:</span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> &#125;</span><br><span class="line"> &#125;</span><br><span class="line">&#125;()</span><br><span class="line">done &lt;- <span class="literal">true</span></span><br></pre></td></tr></table></figure></p>
<p>这本质上是一个带有退出条件的异步 for-loop。</p>
<p>重要的逻辑将在默认条件下编写。</p>
<blockquote>
<p>当值被发送到 <code>done</code> 通道时,循环将停止,正如 <code>done &lt;- true</code> 所示。这意味着 channel 读取 <code>&lt;-done</code> 成功并返回。</p>
<p>译自:<a href="https://itnext.io/how-to-write-bug-free-goroutines-in-go-golang-59042b1b63fb" target="_blank" rel="external">https://itnext.io/how-to-write-bug-free-goroutines-in-go-golang-59042b1b63fb</a></p>
</blockquote>
<p><img src="/images/wx_dyh.png" alt="微信订阅号"></p>
</content>
<summary type="html">
<h2 id="GO-并发"><a href="#GO-并发" class="headerlink" title="GO 并发"></a>GO 并发</h2><p>Go 以其并发性著称,深受人们喜爱。go 运行时管理轻量级线程,称为 goroutines。goroutine 的编写非常快速简单。</p>
<p>你只需在你想异步执行的函数前输入 <code>go</code>,程序就会在另一个线程中执行。</p>
<p><strong>听起来很简单?</strong></p>
<p>goroutines 是 Go 编写异步代码的方式。</p>
<p>重要的是要了解 goroutine 和并发的工作原理。Go 提供了管理 goroutine 的方法,使它们在复杂的程序中更容易管理和预测。</p>
<blockquote>
<p>因为 goroutine 非常容易使用,所以它们很容易被滥用。</p>
</blockquote>
</summary>
<category term="golang" scheme="http://team.jiunile.com/categories/golang/"/>
<category term="goroutines" scheme="http://team.jiunile.com/categories/golang/goroutines/"/>
<category term="go" scheme="http://team.jiunile.com/tags/go/"/>
<category term="goroutines" scheme="http://team.jiunile.com/tags/goroutines/"/>
</entry>
<entry>
<title>利用 eBPF 支撑大规模 K8s Service</title>
<link href="http://team.jiunile.com//blog/2020/12/k8s-cilium-service-2019.html"/>
<id>http://team.jiunile.com//blog/2020/12/k8s-cilium-service-2019.html</id>
<published>2020-12-02T14:00:00.000Z</published>
<updated>2020-12-03T02:31:35.000Z</updated>
<content type="html"><h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>本文翻译自 2019 年 Daniel Borkmann 和 Martynas Pumputis 在 Linux Plumbers Conference 的一篇分享: <a href="https://linuxplumbersconf.org/event/4/contributions/458/" target="_blank" rel="external">Making the Kubernetes Service Abstraction Scale using eBPF</a> 。 翻译时对大家耳熟能详或已显陈旧的内容(K8s 介绍、Cilium 1.6 之前的版本对 Service 实现等)略有删减,如有需要请查阅原 PDF。</p>
<p>实际上,一年之后 Daniel 和 Martynas 又在 LPC 做了一次分享,内容是本文的延续:<a href="http://team.jiunile.com/blog/2020/11/k8s-cilium-service.html">Cilium:基于 BPF/XDP 实现 K8s Service 负载均衡</a></p>
<p><strong>K8s 当前重度依赖 iptables 来实现 Service 的抽象</strong>。对于每个 Service 及其 backend pods,在 K8s 里会生成很多 iptables 规则。<strong>例如 5K 个 Service 时,iptables 规则将达到 25K 条</strong>,导致的后果:</p>
<ul>
<li><strong>较高、并且不可预测的转发延迟</strong>(packet latency),因为每个包都要遍历这些规则 ,直到匹配到某条规则;</li>
<li><strong>更新规则的操作非常慢</strong>:无法单独更新某条 iptables 规则,只能将全部规则读出来 ,更新整个集合,再将新的规则集合下发到宿主机。在动态环境中这一问题尤其明显,因为每 小时可能都有几千次的 backend pods 创建和销毁。</li>
<li><strong>可靠性问题</strong>:iptables 依赖 Netfilter 和系统的连接跟踪模块(conntrack),在 大流量场景下会出现一些竞争问题(race conditions);<strong>UDP 场景尤其明显</strong>,会导 致丢包、应用的负载升高等问题。</li>
</ul>
<p>本文将介绍如何基于 Cilium/BPF 来解决这些问题,实现 K8s Service 的大规模扩展。<br><a id="more"></a></p>
<h2 id="1-K8s-Service-类型及默认基于-kube-proxy-的实现"><a href="#1-K8s-Service-类型及默认基于-kube-proxy-的实现" class="headerlink" title="1 K8s Service 类型及默认基于 kube-proxy 的实现"></a>1 K8s Service 类型及默认基于 kube-proxy 的实现</h2><p>K8s 提供了 Service 抽象,可以将多个 backend pods 组织为一个<strong>逻辑单元</strong>(logical unit)。K8s 会为这个逻辑单元分配 <strong>虚拟 IP 地址</strong>(VIP),客户端通过该 VIP 就 能访问到这些 pods 提供的服务。</p>
<p>下图是一个具体的例子,<br><img src="/images/k8s/cim_k8s-service.png" alt="k8s-service"></p>
<ol>
<li>右边的 yaml 定义了一个名为 <code>nginx</code> 的 Service,它在 TCP 80 端口提供服务;<ul>
<li>创建:<code>kubectl -f nginx-svc.yaml</code></li>
</ul>
</li>
<li>K8s 会给每个 Service 分配一个虚拟 IP,这里给 <code>nginx</code> 分的是 <code>3.3.3.3</code>;<ul>
<li>查看:<code>kubectl get service nginx</code></li>
</ul>
</li>
<li>左边是 <code>nginx</code> Service 的两个 backend pods(在 K8s 对应两个 endpoint),这里 位于同一台节点,每个 Pod 有独立的 IP 地址;<ul>
<li>查看:<code>kubectl get endpoints nginx</code></li>
</ul>
</li>
</ol>
<p>上面看到的是所谓的 <code>ClusterIP</code> 类型的 Service。实际上,<strong>在 K8s 里有几种不同类型 的 Service</strong>:</p>
<ul>
<li>ClusterIP</li>
<li>NodePort</li>
<li>LoadBalancer</li>
<li>ExternalName</li>
</ul>
<p>本文将主要关注前两种类型。</p>
<p><strong>K8s 里实现 Service 的组件是 kube-proxy</strong>,实现的主要功能就是<strong>将访问 VIP 的请 求转发(及负载均衡)到相应的后端 pods</strong>。前面提到的那些 iptables 规则就是它创建 和管理的。</p>
<p>另外,kube-proxy 是 K8s 的可选组件,如果不需要 Service 功能,可以不启用它。</p>
<h3 id="ClusterIP-Service"><a href="#ClusterIP-Service" class="headerlink" title="ClusterIP Service"></a>ClusterIP Service</h3><p>这是 <strong>K8s 的默认 Service 类型,使得宿主机或 pod 可以通过 VIP 访问一个 Service</strong>。</p>
<ul>
<li>Virtual IP to any endpoint (pod)</li>
<li>Only in-cluster access</li>
</ul>
<p>kube-proxy 是通过如下的 iptables 规则来实现这个功能的:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">-t nat -A &#123;PREROUTING, OUTPUT&#125; -m conntrack --ctstate NEW -j KUBE-SERVICES</span><br><span class="line"></span><br><span class="line"><span class="comment"># 宿主机访问 nginx Service 的流量,同时满足 4 个条件:</span></span><br><span class="line"><span class="comment"># 1. src_ip 不是 Pod 网段</span></span><br><span class="line"><span class="comment"># 2. dst_ip=3.3.3.3/32 (ClusterIP)</span></span><br><span class="line"><span class="comment"># 3. proto=TCP</span></span><br><span class="line"><span class="comment"># 4. dport=80</span></span><br><span class="line"><span class="comment"># 如果匹配成功,直接跳转到 KUBE-MARK-MASQ;否则,继续匹配下面一条(iptables 是链式规则,高优先级在前)</span></span><br><span class="line"><span class="comment"># 跳转到 KUBE-MARK-MASQ 是为了保证这些包出宿主机时,src_ip 用的是宿主机 IP。</span></span><br><span class="line">-A KUBE-SERVICES ! <span class="_">-s</span> 1.1.0.0/16 <span class="_">-d</span> 3.3.3.3/32 -p tcp -m tcp --dport 80 -j KUBE-MARK-MASQ</span><br><span class="line"><span class="comment"># Pod 访问 nginx Service 的流量:同时满足 4 个条件:</span></span><br><span class="line"><span class="comment"># 1. 没有匹配到前一条的,(说明 src_ip 是 Pod 网段)</span></span><br><span class="line"><span class="comment"># 2. dst_ip=3.3.3.3/32 (ClusterIP)</span></span><br><span class="line"><span class="comment"># 3. proto=TCP</span></span><br><span class="line"><span class="comment"># 4. dport=80</span></span><br><span class="line">-A KUBE-SERVICES <span class="_">-d</span> 3.3.3.3/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-NGINX</span><br><span class="line"></span><br><span class="line"><span class="comment"># 以 50% 的概率跳转到 KUBE-SEP-NGINX1</span></span><br><span class="line">-A KUBE-SVC-NGINX -m statistic --mode random --probability 0.50 -j KUBE-SEP-NGINX1</span><br><span class="line"><span class="comment"># 如果没有命中上面一条,则以 100% 的概率跳转到 KUBE-SEP-NGINX2</span></span><br><span class="line">-A KUBE-SVC-NGINX -j KUBE-SEP-NGINX2</span><br><span class="line"></span><br><span class="line"><span class="comment"># 如果 src_ip=1.1.1.1/32,说明是 Service-&gt;client 流量,则</span></span><br><span class="line"><span class="comment"># 需要做 SNAT(MASQ 是动态版的 SNAT),替换 src_ip -&gt; svc_ip,这样客户端收到包时,</span></span><br><span class="line"><span class="comment"># 看到就是从 svc_ip 回的包,跟它期望的是一致的。</span></span><br><span class="line">-A KUBE-SEP-NGINX1 <span class="_">-s</span> 1.1.1.1/32 -j KUBE-MARK-MASQ</span><br><span class="line"><span class="comment"># 如果没有命令上面一条,说明 src_ip != 1.1.1.1/32,则说明是 client-&gt; Service 流量,</span></span><br><span class="line"><span class="comment"># 需要做 DNAT,将 svc_ip -&gt; pod1_ip,</span></span><br><span class="line">-A KUBE-SEP-NGINX1 -p tcp -m tcp -j DNAT --to-destination 1.1.1.1:80</span><br><span class="line"><span class="comment"># 同理,见上面两条的注释</span></span><br><span class="line">-A KUBE-SEP-NGINX2 <span class="_">-s</span> 1.1.1.2/32 -j KUBE-MARK-MASQ</span><br><span class="line">-A KUBE-SEP-NGINX2 -p tcp -m tcp -j DNAT --to-destination 1.1.1.2:80</span><br></pre></td></tr></table></figure></p>
<ol>
<li>Service 既要能被宿主机访问,又要能被 pod 访问(<strong>二者位于不同的 netns</strong>), 因此需要在 <code>PREROUTING</code> 和 <code>OUTPUT</code> 两个 hook 点拦截请求,然后跳转到自定义的 <code>KUBE-SERVICES</code> chain;</li>
<li><code>KUBE-SERVICES</code> chain <strong>执行真正的 Service 匹配</strong>,依据协议类型、目的 IP 和目的端口号。当匹配到某个 Service 后,就会跳转到专门针对这个 Service 创 建的 chain,命名格式为 <code>KUBE-SVC-&lt;Service&gt;</code>。</li>
<li><code>KUBE-SVC-&lt;Service&gt;</code> chain <strong>根据概率选择某个后端 pod</strong> 然后将请求转发过去。这其实是一种<strong>穷人的负载均衡器</strong> —— 基于 iptables。选中某个 pod 后,会跳转到这个 pod 相关的一条 iptables chain <code>KUBE-SEP-&lt;POD&gt;</code>。</li>
<li><code>KUBE-SEP-&lt;POD&gt;</code> chain 会<strong>执行 DNAT</strong>,将 VIP 换成 PodIP。</li>
</ol>
<blockquote>
<p>译注:以上解释并不是非常详细和直观,因为这不是本文重点。想更深入地理解基于 iptables 的实现,可参考网上其他一些文章,例如下面这张图所出自的博客 <a href="https://www.stackrox.com/post/2020/01/kubernetes-networking-demystified/" target="_blank" rel="external">Kubernetes Networking Demystified: A Brief Guide</a>,<br><img src="/images/k8s/cim_k8s-net-demystified-svc-lb.png" alt="k8s-net-demystified-svc-lb"></p>
</blockquote>
<h3 id="1-2-NodePort-Service"><a href="#1-2-NodePort-Service" class="headerlink" title="1.2 NodePort Service"></a>1.2 NodePort Service</h3><p>这种类型的 Service 也能被宿主机和 pod 访问,但与 ClusterIP 不同的是,<strong>它还能被集群外的服务访问</strong>。</p>
<ul>
<li>External node IP + port in NodePort range to any endpoint (pod), e.g. 10.0.0.1:31000</li>
<li>Enables access from outside</li>
</ul>
<p>实现上,kube-apiserver 会<strong>从预留的端口范围内分配一个端口给 Service</strong>,然后<strong>每个宿主机上的 kube-proxy 都会创建以下规则</strong>:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">-t nat -A &#123;PREROUTING, OUTPUT&#125; -m conntrack --ctstate NEW -j KUBE-SERVICES</span><br><span class="line"></span><br><span class="line">-A KUBE-SERVICES ! <span class="_">-s</span> 1.1.0.0/16 <span class="_">-d</span> 3.3.3.3/32 -p tcp -m tcp --dport 80 -j KUBE-MARK-MASQ</span><br><span class="line">-A KUBE-SERVICES <span class="_">-d</span> 3.3.3.3/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-NGINX</span><br><span class="line"><span class="comment"># 如果前面两条都没匹配到(说明不是 ClusterIP service 流量),并且 dst 是 LOCAL,跳转到 KUBE-NODEPORTS</span></span><br><span class="line">-A KUBE-SERVICES -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS</span><br><span class="line"></span><br><span class="line">-A KUBE-NODEPORTS -p tcp -m tcp --dport 31000 -j KUBE-MARK-MASQ</span><br><span class="line">-A KUBE-NODEPORTS -p tcp -m tcp --dport 31000 -j KUBE-SVC-NGINX</span><br><span class="line"></span><br><span class="line">-A KUBE-SVC-NGINX -m statistic --mode random --probability 0.50 -j KUBE-SEP-NGINX1</span><br><span class="line">-A KUBE-SVC-NGINX -j KUBE-SEP-NGINX2</span><br></pre></td></tr></table></figure></p>
<ol>
<li>前面几步和 ClusterIP Service 一样;如果没匹配到 ClusterIP 规则,则跳转到 <code>KUBE-NODEPORTS</code> chain。</li>
<li><code>KUBE-NODEPORTS</code> chain 里做 Service 匹配,但<strong>这次只匹配协议类型和目的端口号</strong>。</li>
<li>匹配成功后,转到对应的 <code>KUBE-SVC-&lt;Service&gt;</code> chain,后面的过程跟 ClusterIP 是一样的。</li>
</ol>
<h3 id="1-3-小结"><a href="#1-3-小结" class="headerlink" title="1.3 小结"></a>1.3 小结</h3><p>以上可以看到,每个 Service 会对应多条 iptables 规则。</p>
<p>Service 数量不断增长时,<strong>iptables 规则的数量增长会更快</strong>。而且,<strong>每个包都需要遍历这些规则</strong>,直到最终匹配到一条相应的规则。如果不幸匹配到最后一条规则才命中, 那相比其他流量,这些包就会有<strong>很高的延迟</strong>。</p>
<p>有了这些背景知识,我们来看如何用 BPF/Cilium 来替换掉 kube-proxy,也可以说是 重新实现 kube-proxy 的逻辑。</p>
<h2 id="2-用-Cilium-BPF-替换-kube-proxy"><a href="#2-用-Cilium-BPF-替换-kube-proxy" class="headerlink" title="2 用 Cilium/BPF 替换 kube-proxy"></a>2 用 Cilium/BPF 替换 kube-proxy</h2><p>我们从 Cilium 早起版本开始,已经逐步用 BPF 实现 Service 功能,但其中仍然有些 地方需要用到 iptables。在这一时期,每台 node 上会同时运行 cilium-agent 和 kube-proxy。</p>
<p>到了 Cilium 1.6,我们已经能<strong>完全基于 BPF 实现,不再依赖 iptables,也不再需要 kube-proxy</strong>。<br><img src="/images/k8s/cim_cilium-cluster-ip.png" alt="cim_cilium-cluster-ip"></p>
<p>这里有一些实现上的考虑:相比于在 TC ingress 层做 Service 转换,我们优先利用 cgroupv2 hooks,<strong>在 socket BPF 层直接做这种转换</strong>(需要高版本内核支持,如果不支 持则 fallback 回 TC ingress 方式)。</p>
<h3 id="2-1-ClusterIP-Service"><a href="#2-1-ClusterIP-Service" class="headerlink" title="2.1 ClusterIP Service"></a>2.1 ClusterIP Service</h3><p>对于 ClusterIP,我们在 BPF 里<strong>拦截 socket 的 <code>connect</code> 和 <code>send</code> 系统调用</strong>; 这些 BPF 执行时,<strong>协议层还没开始执行</strong>(这些系统调用 handlers)。</p>
<ul>
<li>Attach on the cgroupv2 root mount <code>BPF_PROG_TYPE_CGROUP_SOCK_ADDR</code></li>
<li><code>BPF_CGROUP_INET{4,6}_CONNECT</code> - TCP, connected UDP</li>
</ul>
<h4 id="TCP-amp-connected-UDP"><a href="#TCP-amp-connected-UDP" class="headerlink" title="TCP &amp; connected UDP"></a>TCP &amp; connected UDP</h4><p>对于 TCP 和 connected UDP 场景,执行的是下面一段逻辑,<br><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">sock4_xlate</span><span class="params">(<span class="keyword">struct</span> bpf_sock_addr *ctx)</span> </span>&#123;</span><br><span class="line"> <span class="keyword">struct</span> lb4_svc_key key = &#123; .dip = ctx-&gt;user_ip4, .dport = ctx-&gt;user_port &#125;;</span><br><span class="line"> svc = lb4_lookup_svc(&amp;key)</span><br><span class="line"> <span class="keyword">if</span> (svc) &#123;</span><br><span class="line"> ctx-&gt;user_ip4 = svc-&gt;endpoint_addr;</span><br><span class="line"> ctx-&gt;user_port = svc-&gt;endpoint_port;</span><br><span class="line"> &#125;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">1</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></p>
<p>所做的事情:在 BPF map 中查找 Service,然后做地址转换。但这里的重点是(相比于 TC ingress BPF 实现):</p>
<ol>
<li><strong>不经过连接跟踪(conntrack)模块,也不需要修改包头</strong>(实际上这时候还没有包 ),也不再 mangle 包。这也意味着,<strong>不需要重新计算包的 checksum</strong>。</li>
<li>对于 TCP 和 connected UDP,<strong>负载均衡的开销是一次性的</strong>,只需要在 socket 建立时做一次转换,后面都不需要了,<strong>不存在包级别的转换</strong>。</li>
<li>这种方式是对宿主机 netns 上的 socket 和 pod netns 内的 socket 都是适用的。</li>
</ol>
<h4 id="某些-UDP-应用:存在的问题及解决方式"><a href="#某些-UDP-应用:存在的问题及解决方式" class="headerlink" title="某些 UDP 应用:存在的问题及解决方式"></a>某些 UDP 应用:存在的问题及解决方式</h4><p>但这种方式<strong>对某些 UDP 应用是不适用的</strong>,因为这些 UDP 应用会检查包的源地址,以及 会调用 <code>recvmsg</code> 系统调用。</p>
<p>针对这个问题,我们引入了新的 BPF attach 类型:</p>
<ul>
<li><code>BPF_CGROUP_UDP4_RECVMSG</code></li>
<li><code>BPF_CGROUP_UDP6_RECVMSG</code></li>
</ul>
<p>另外还引入了用于 NAT 的 UDP map、rev-NAT map:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"> BPF rev NAT map</span><br><span class="line">Cookie EndpointIP Port =&gt; ServiceID IP Port</span><br><span class="line">-----------------------------------------------------</span><br><span class="line">42 1.1.1.1 80 =&gt; 1 3.3.3.30 80</span><br></pre></td></tr></table></figure></p>
<ul>
<li>通过 <code>bpf_get_socket_cookie()</code> 创建 socket cookie。</li>
<li>除了 Service 访问方式,还会有一些<strong>客户端通过 PodIP 直连的方式建立 UDP 连接, cookie 就是为了防止对这些类型的流量做 rev-NAT</strong>。</li>
<li>在 <code>connect(2)</code> 和 <code>sendmsg(2)</code> 时更新 map。</li>
<li>在 <code>recvmsg(2)</code> 时做 rev-NAT。</li>
</ul>
<h3 id="2-2-NodePort-Service"><a href="#2-2-NodePort-Service" class="headerlink" title="2.2 NodePort Service"></a>2.2 NodePort Service</h3><p>NodePort 会更复杂一些,我们先从最简单的场景看起。</p>
<h4 id="2-2-1-后端-pod-在本节点"><a href="#2-2-1-后端-pod-在本节点" class="headerlink" title="2.2.1 后端 pod 在本节点"></a>2.2.1 后端 pod 在本节点</h4><p><img src="/images/k8s/cim_cilium-node-port.png" alt="cilium-node-port"></p>
<p>后端 pod 在本节点时,只需要<strong>在宿主机的网络设备上 attach 一段 tc ingress bpf 程序</strong>,这段程序做的事情:</p>
<ol>
<li>Service 查找</li>
<li>DNAT</li>
<li>redirect 到容器的 lxc0。</li>
</ol>
<p>对于应答包,lxc0 负责 rev-NAT,FIB 查找(因为我们需要设置 L2 地址,否则会被 drop), 然后将其 redirect 回客户端。</p>
<h3 id="2-2-2-后端-pod-在其他节点"><a href="#2-2-2-后端-pod-在其他节点" class="headerlink" title="2.2.2 后端 pod 在其他节点"></a>2.2.2 后端 pod 在其他节点</h3><p>后端 pod 在其他节点时,会复杂一些,因为要转发到其他节点。这种情况下,<strong>需要在 BPF 做 SNAT</strong>,否则 pod 会直接回包给客户端,而由于不同 node 之间没有做连接跟踪( conntrack)同步,因此直接回给客户端的包出 pod 后就会被 drop 掉。</p>
<p>所以需要<strong>在当前节点做一次 SNAT</strong>(<code>src_ip</code> 从原来的 ClientIP 替换为 NodeIP),让回包也经过 当前节点,然后在这里再做 rev-SNAT(<code>dst_ip</code> 从原来的 NodeIP 替换为 ClientIP)。</p>
<p>具体来说,在 <strong>TC ingress</strong> 插入一段 BPF 代码,然后依次执行:Service 查找、DNAT、 选择合适的 egress interface、SNAT、FIB lookup,最后发送给相应的 node,<br><img src="/images/k8s/cim_cilium-node-port-2.png" alt="cilium-node-port-2"></p>
<p>反向路径是类似的,也是回到这个 node,TC ingress BPF 先执行 rev-SNAT,然后 rev-DNAT,FIB lookup,最后再发送回客户端,<br><img src="/images/k8s/cim_cilium-node-port-3.png" alt="cilium-node-port-3"></p>
<p>现在跨宿主机转发是 SNAT 模式,但将来我们打算支持 <strong>DSR 模式</strong>(译注,Cilium 1.8+ 已经支持了)。DSR 的好处是 <strong>backend pods 直接将包回给客户端</strong>,回包不再经过当前 节点转发。</p>
<p>另外,现在 Service 的处理是在 TC ingress 做的,<strong>这些逻辑其实也能够在 XDP 层实现</strong>, 那将会是另一件激动人心的事情(译注,Cilium 1.8+ 已经支持了,性能大幅提升)。</p>
<h4 id="SNAT"><a href="#SNAT" class="headerlink" title="SNAT"></a>SNAT</h4><p>当前基于 BPF 的 SNAT 实现中,用一个 LRU BPF map 存放 Service 和 backend pods 的映 射信息。</p>
<p>需要说明的是,<strong>SNAT 除了替换 <code>src_ip</code>,还可能会替换 <code>src_port</code></strong>:不同客户端的 <code>src_port</code> 可能是相同的,如果只替换 <code>src_ip</code>,不同客户端的应答包在反向转换时就会失 败。因此这种情况下需要做 <code>src_port</code> 转换。现在的做法是,先进行哈希,如果哈希失败, 就调用 <code>prandom()</code> 随机选择一个端口。</p>
<p>此外,我们还需要跟踪宿主机上的流(local flows)信息,因此在 Cilium 里<strong>基于 BPF 实现了一个连接跟踪器</strong>(connection tracker),它会监听宿主机的主物理网络设备( main physical device);我们也会对宿主机上的应用执行 NAT,pod 流量 NAT 之后使用的 是宿主机的 src_port,而宿主机上的应用使用的也是同一个 src_port 空间,它们可能会 有冲突,因此需要在这里处理。</p>
<p>这就是 NodePort Service 类型的流量到达一台节点后,我们在 BPF 所做的事情。</p>
<h3 id="2-2-3-Client-pods-和-backend-pods-在同一节点"><a href="#2-2-3-Client-pods-和-backend-pods-在同一节点" class="headerlink" title="2.2.3 Client pods 和 backend pods 在同一节点"></a>2.2.3 Client pods 和 backend pods 在同一节点</h3><p>另外一种情况是:本机上的 pod 访问某个 NodePort Service,而且 backend pods 也在本机。</p>
<p>这种情况下,流量会从 loopback 口转发到 backend pods,中间会经历路由和转发过程, 整个过程对应用是透明的 —— 我们可以在<strong>应用无感知的情况下,修改二者之间的通信方式</strong>, 只要流量能被双方正确地接受就行。因此,我们在这里<strong>使用了 ClusterIP,并对其进行了一点扩展</strong>,只要连接的 Service 是 loopback 地址或者其他 local 地址,它都能正 确地转发到本机 pods。</p>
<p>另外,比较好的一点是,这种实现方式是基于 cgroups 的,因此独立于 netns。这意味着 我们不需要进入到每个 pod 的 netns 来做这种转换。<br><img src="/images/k8s/cim_cilium-snat.png" alt="cilium-snat"></p>
<h2 id="2-3-Service-规则的规模及请求延迟对比"><a href="#2-3-Service-规则的规模及请求延迟对比" class="headerlink" title="2.3 Service 规则的规模及请求延迟对比"></a>2.3 Service 规则的规模及请求延迟对比</h2><p>有了以上功能,基本上就可以避免 kube-proxy 那样 per-service 的 iptables 规则了, 每个节点上只留下了少数几条由 Kubernetes 自己创建的 iptables 规则:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ iptables-save | grep ‘\-A KUBE’ | wc <span class="_">-l</span></span><br></pre></td></tr></table></figure></p>
<ul>
<li>With kube-proxy: 25401</li>
<li>With BPF: 4</li>
</ul>
<p>在将来,我们有希望连这几条规则也不需要,完全绕开 Netfilter 框架(译注:新版本已经做到了)。</p>
<p>此外,我们做了一些初步的基准测试,如下图所示,<br><img src="/images/k8s/cim_performance.png" alt="performance"></p>
<p>可以看到,随着 Service 数量从 1 增加到 2000+,<strong>kube-proxy/iptables 的请求延 迟增加了将近一倍</strong>,而 Cilium/eBPF 的延迟几乎没有任何增加。</p>
<h2 id="3-相关的-Cilium-BPF-优化"><a href="#3-相关的-Cilium-BPF-优化" class="headerlink" title="3 相关的 Cilium/BPF 优化"></a>3 相关的 Cilium/BPF 优化</h2><p>接下来介绍一些我们在实现 Service 过程中的优化工作,以及一些未来可能会做的事情。</p>
<h3 id="3-1-BPF-UDP-recvmsg-hook"><a href="#3-1-BPF-UDP-recvmsg-hook" class="headerlink" title="3.1 BPF UDP recvmsg() hook"></a>3.1 BPF UDP <code>recvmsg()</code> hook</h3><p>实现 socket 层 UDP Service 转换时,我们发现如果只对 UDP <code>sendmsg</code> 做 hook ,会导致 <strong>DNS 等应用无法正常工作</strong>,会出现下面这种错误:<br><img src="/images/k8s/cim_udp-recvmsg-before.png" alt="udp-recvmsg-before"></p>
<p>深入分析发现,<code>nslookup</code> 及其他一些工具会检查 <strong><code>connect()</code> 时用的 IP 地址和 <code>recvmsg()</code> 读到的 reply message 里的 IP 地址</strong>是否一致。如果不一致,就会 报上面的错误。</p>
<p>原因清楚之后,解决就比较简单了:我们引入了一个做反向映射的 BPF hook,对 <code>recvmsg()</code> 做额外处理,这个问题就解决了:<br><img src="/images/k8s/cim_udp-recvmsg-after.png" alt="udp-recvmsg-after"></p>
<blockquote>
<p><a href="https://github.com/torvalds/linux/commit/983695fa6765" target="_blank" rel="external">983695fa6765</a> bpf: fix unconnected udp hooks。<br>这个 patch 能在不重写包(without packet rewrite)的前提下,会对 BPF ClusterIP 做反向映射(reverse mapping)。</p>
</blockquote>
<h3 id="3-2-全局唯一-socket-cookie"><a href="#3-2-全局唯一-socket-cookie" class="headerlink" title="3.2 全局唯一 socket cookie"></a>3.2 全局唯一 socket cookie</h3><p>BPF ClusterIP Service 为 UDP 维护了一个 LRU 反向映射表(reverse mapping table)。</p>
<p><strong>Socket cookie 是这个映射表的 key 的一部分,但这个 cookie 只在每个 netns 内唯一</strong>,其背后的实现比较简单:每次调用 BPF cookie helper,它都会增加计数器,然后将 cookie 存储到 socket。因此不同 netns 内分配出来的 cookie 值可能会一样,导致冲突。</p>
<p>为解决这个问题,我们将 cookie generator 改成了全局的,见下面的 commit。</p>
<blockquote>
<p><a href="https://github.com/torvalds/linux/commit/cd48bdda4fb8" target="_blank" rel="external">cd48bdda4fb8</a> sock: make cookie generation global instead of per netns。</p>
</blockquote>
<h3 id="3-3-维护邻居表"><a href="#3-3-维护邻居表" class="headerlink" title="3.3 维护邻居表"></a>3.3 维护邻居表</h3><p>Cilium agent 从 K8s apiserver 收到 Service 事件时, 会将 backend entry 更新到 datapath 中的 Service backend 列表。</p>
<p>前面已经看到,当 Service 是 NodePort 类型并且 backend 是 remote 时,需要转发到其 他节点(TC ingress BPF <code>redirect()</code>)。</p>
<p>我们发现<strong>在某些直接路由(direct routing)的场景下,会出现 fib 查找失败的问题</strong> (<code>fib_lookup()</code>),原因是系统中没有对应 backend 的 neighbor entry(IP-&gt;MAC 映射 信息),并且接下来<strong>不会主动做 ARP 探测</strong>(ARP probe)。</p>
<blockquote>
<p>Tunneling 模式下这个问题可以忽略,因为本来发送端的 BPF 程 序就会将 src/dst mac 清零,另一台节点对收到的包做处理时, VxLAN 设备上的另一段 BPF 程序会能够正确的转发这个包,因此这种方式更像是 L3 方式。</p>
</blockquote>
<p>我们目前 workaround 了这个问题,解决方式有点丑陋:Cilium 解析 backend,然后直接 将 neighbor entry 永久性地(<code>NUD_PERMANENT</code>)插入邻居表中。</p>
<p>目前这样做是没问题的,因为邻居的数量是固定或者可控的(fixed/controlled number of entries)。但后面我们想尝试的是让内核来做这些事情,因为它能以最好的方式处理这个 问题。实现方式就是引入一些新的 <code>NUD_*</code> 类型,只需要传 L3 地址,然后内核自己将解 析 L2 地址,并负责这个地址的维护。这样 Cilium 就不需要再处理 L2 地址的事情了。 但到今天为止,我并没有看到这种方式的可能性。</p>
<p>对于从集群外来的访问 NodePort Service 的请求,也存在类似的问题, 因为最后将响应流量回给客户端也需要邻居表。由于这些流量都是在 pre-routing,因此我 们现在的处理方式是:自己维护了一个小的 BPF LRU map(L3-&gt;L2 mapping in BPF LRU map);由于这是主处理逻辑(转发路径),流量可能很高,因此将这种映射放到 BPF LRU 是更合适的,不会导致邻居表的 overflow。</p>
<h3 id="3-4-LRU-BPF-callback-on-entry-eviction"><a href="#3-4-LRU-BPF-callback-on-entry-eviction" class="headerlink" title="3.4 LRU BPF callback on entry eviction"></a>3.4 LRU BPF callback on entry eviction</h3><p>我们想讨论的另一件事情是:在每个 LRU entry 被 eviction(驱逐)时,能有一个 callback 将会更好。为什么呢?</p>
<p>Cilium 中现在有一个 BPF conntrack table,我们支持到了一些非常老的内核版本 ,例如 4.9。Cilium 在启动时会检查内核版本,优先选择使用 LRU,没有 LRU 再 fallback 到普通的哈希表(Hash Table)。<strong>对于哈希表,就需要一个不断 GC 的过程</strong>。</p>
<p>我们<strong>有意将 NAT map 与 CT map 独立开来</strong>,这是因 为我们要求在 <strong>cilium-agent 升级或降级过程中,现有的连接/流量不能受影响</strong>。 如果二者是耦合在一起的,假如 CT 相关的东西有很大改动,那升级时那要么 是将当前的连接状态全部删掉重新开始;要么就是服务中断,临时不可用,升级完成后再将 老状态迁移到新状态表,但我认为,要轻松、正确地实现这件事情非常困难。 这就是为什么将它们分开的原因。但实际上,GC 在回收 CT entry 的同时, 也会顺便回收 NAT entry。</p>
<p>另外一个问题:<strong>每次从 userspace 操作 conntrack entry 都会破坏 LRU 的正常工作流程</strong>(因为不恰当地更新了所有 entry 的时间戳)。我们通过下面的 commit 解决了这个问题,但要彻底避免这个问题,<strong>最好有一个 GC 以 callback 的方式在第一时 间清理掉这些被 evicted entry</strong>,例如在 CT entry 被 evict 之后,顺便也清理掉 NAT 映射。这是我们正在做的事情(译注,Cilium 1.9+ 已经实现了)。</p>
<blockquote>
<p><a href="https://github.com/torvalds/linux/commit/50b045a8c0cc" target="_blank" rel="external">50b045a8c0cc</a> (“bpf, lru: avoid messing with eviction heuristics upon syscall lookup”) fixed map walking from user space</p>
</blockquote>
<h3 id="3-5-LRU-BPF-eviction-zones"><a href="#3-5-LRU-BPF-eviction-zones" class="headerlink" title="3.5 LRU BPF eviction zones"></a>3.5 LRU BPF eviction zones</h3><p>另一件跟 CT map 相关的比较有意思的探讨:<strong>未来是否能根据流量类型,将 LRU eviction 分割为不同的 zone</strong>?例如,</p>
<ul>
<li>东西向流量分到 zone1:处理 ClusterIP service 流量,都是 pod-{pod,host} 流量, 比较大;</li>
<li>南北向流量分到 zone2:处理 NodePort 和 ExternalName service 流量,相对比较小。</li>
</ul>
<p>这样的好处是:当<strong>对南北向流量 CT 进行操作时,占大头的东西向流量不会受影响</strong>。</p>
<p>理想的情况是这种隔离是有保障的,例如:可以安全地假设,如果正在清理 zone1 内的 entries, 那预期不会对 zone2 内的 entry 有任何影响。不过,虽然分为了多个 zones,但在全局, 只有一个 map。</p>
<h3 id="3-6-BPF-原子操作"><a href="#3-6-BPF-原子操作" class="headerlink" title="3.6 BPF 原子操作"></a>3.6 BPF 原子操作</h3><p>另一个要讨论的内容是原子操作。</p>
<p>使用场景之一是<strong>过期 NAT entry 的快速重复利用</strong>(fast recycling)。 例如,结合前面的 GC 过程,如果一个连接断开时, 不是直接删除对应的 entry,而是更 新一个标记,表明这条 entry 过期了;接下来如果有新的连接刚好命中了这个 entry,就 直接将其标记为正常(非过期),重复利用(循环)这个 entry,而不是像之前一样从新创 建。</p>
<p>现在基于 BPF spinlock 可以实现做这个功能,但并不是最优的方式,因为如果有合适的原 子操作,我们就能节省两次辅助函数调用,然后将 spinlock 移到 map 里。将 spinlock 放到 map 结构体的额外好处是,每个结构体都有自己独立的结构(互相解耦),因此更能 够避免升级/降低导致的问题。</p>
<p>当前内核只有 <code>BPF_XADD</code> 指令,我认为它主要适用于计数(counting),因为它并不像原 子递增(inc)函数一样返回一个值。此外内核中还有的就是针对 maps 的 spinlock。</p>
<p>我觉得如果有 <code>READ_ONCE/WRITE_ONCE</code> 语义将会带来很大便利,现在的 BPF 代码中其实已 经有了一些这样功能的、自己实现的代码。此外,我们还需要 <code>BPF_XCHG</code>, <code>BPF_CMPXCHG</code> 指令,这也将带来很大帮助。</p>
<h3 id="3-7-BPF-getpeername-hook"><a href="#3-7-BPF-getpeername-hook" class="headerlink" title="3.7 BPF getpeername hook"></a>3.7 BPF <code>getpeername</code> hook</h3><p>还有一个 hook —— <code>getpeername()</code> —— 没有讨论到,它<strong>用在 TCP 和 connected UDP 场景</strong>,对应用是透明的。</p>
<p>这里的想法是:永远返回 Service IP 而不是 backend pod IP,这样对应用来说,它看到 就是和 Service IP 建立的连接,而不是和某个具体的 backend pod。</p>
<p>现在返回的是 backend IP 而不是 service IP。从应用的角度看,它连接到的对端并不是 它期望的。</p>
<h3 id="3-8-绕过内核最大-BPF-指令数的限制"><a href="#3-8-绕过内核最大-BPF-指令数的限制" class="headerlink" title="3.8 绕过内核最大 BPF 指令数的限制"></a>3.8 绕过内核最大 BPF 指令数的限制</h3><p>最后再讨论几个非内核的改动(non-kernel changes)。</p>
<p>内核对 <strong>BPF 最大指令数有 4K 条</strong>的限制,现在这个限制已经放大到 <strong>1M</strong>(一百万) 条(但需要 5.1+ 内核,或者稍低版本的内核 + 相应 patch)。</p>
<p>我们的 BPF 程序中包含了 NAT 引擎,因此肯定是超过这个限制的。 但 Cilium 这边,我们目前还并未用到这个新的最大限制,而是通过“外包”的方式将 BPF 切分成了子 BPF 程序,然后通过尾调用(tail call)跳转过去,以此来绕过这个 4K 的限 制。</p>
<p>另外,我们当前使用的是 BPF tail call,而不是 BPF-to-BPF call,因为<strong>二者不能同时使用</strong>。更好的方式是,Cilium agent 在启动时进行检查,如果内核支持 1M BPF insns/complexity limit + bounded loops(我们用于 NAT mappings 查询优化),就用这 些新特性;否则回退到尾调用的方式。</p>
<h2 id="4-Cilium-上手:用-kubeadm-搭建体验环境"><a href="#4-Cilium-上手:用-kubeadm-搭建体验环境" class="headerlink" title="4 Cilium 上手:用 kubeadm 搭建体验环境"></a>4 Cilium 上手:用 kubeadm 搭建体验环境</h2><p>有兴趣尝试 Cilium,可以参考下面的快速安装命令:<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">$ kubeadm init --pod-network-cidr=10.217.0.0/16 --skip-phases=addon/kube-proxy</span><br><span class="line">$ kubeadm join [...]</span><br><span class="line">$ helm template cilium \</span><br><span class="line"> --namespace kube-system --set global.nodePort.enabled=<span class="literal">true</span> \</span><br><span class="line"> --set global.k8sServiceHost=<span class="variable">$API_SERVER_IP</span> \</span><br><span class="line"> --set global.k8sServicePort=<span class="variable">$API_SERVER_PORT</span> \</span><br><span class="line"> --set global.tag=v1.6.1 &gt; cilium.yaml</span><br><span class="line"> kubectl apply <span class="_">-f</span> cilium.yaml</span><br></pre></td></tr></table></figure></p>
<p>附录: <a href="/images/k8s/Making_the_Kubernetes_Service_Abstraction_Scale_using_BPF.pdf">Making_the_Kubernetes_Service_Abstraction_Scale_using_BPF.pdf</a></p>
<blockquote>
<p>译自:ArthurChiao 原文:<a href="https://linuxplumbersconf.org/event/4/contributions/458/" target="_blank" rel="external">https://linuxplumbersconf.org/event/4/contributions/458/</a></p>
</blockquote>
<p><img src="/images/wx_dyh.png" alt="微信订阅号"></p>
</content>
<summary type="html">