-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsearch.xml
1876 lines (906 loc) · 591 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>服务器压力测试</title>
<link href="2023/05/12/p65.html"/>
<url>2023/05/12/p65.html</url>
<content type="html"><![CDATA[<h1 id="cpu压力测试">CPU压力测试</h1><p>用<a href="https://github.com/ColinIanKing/stress-ng" target="_blank" rel="noopener">Stress-ng</a>。</p><h1 id="gpu压力测试">GPU压力测试</h1><p>用<a href="https://github.com/wilicc/gpu-burn" target="_blank" rel="noopener">gpu-burn</a>。</p>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> Ubuntu </tag>
<tag> 压力测试 </tag>
</tags>
</entry>
<entry>
<title>使用ddrescue和GParted工具修复Ubuntu系统无法启动</title>
<link href="2023/02/28/p64.html"/>
<url>2023/02/28/p64.html</url>
<content type="html"><![CDATA[<h1 id="前言">前言</h1><p>一台Ubuntu机器无法启动,启动界面报错<code>error: failure reading sector 0x1910900 from hd2</code>。通过硬盘检测工具发现,作为系统盘的固态硬盘响应特别缓慢,这应该是导致Ubuntu无法启动的原因。另外,发现作为数据盘的机械硬盘存在许多坏道,也需要更换。同时,该Ubuntu拥有许多服务,重装系统将会耗费相当多的精力,所以得想办法克隆整个硬盘。基于Linux下“一切皆文件”的思想,使得备份与恢复Linux系统有了操作空间,但是在实际执行过程中发现还是有很多需要注意的地方,故记录于此。</p><h1 id="方案对比">方案对比</h1><ol type="1"><li><p>Clonezilla:第三方工具,需要刻录启动U盘,但是我刻了它的启动盘却无法启动,不知为何,放弃了。</p></li><li><p>Systemback:网上看到文章写的挺简单,但是操作了一番都发现无法安装,最后发现其停更了,所以不支持Ubuntu 18.04。</p></li><li><p>ddrescue:系统原生硬盘救援工具,实际使用还不错,我是使用的CI接口,网上看有提供UI应该会更好用。新的硬盘比原来硬盘大,用ddrescue会直接将比原来硬盘多出来的空间闲置,可以使用GParted工具扩容分区即可。</p></li></ol><h1 id="克隆硬盘">克隆硬盘</h1><ol type="1"><li><p>将新旧硬盘均插入同一台电脑;</p></li><li><p>使用Ubuntu Desktop启动盘启动该电脑并选择试用Ubuntu,以此进入LiveCD模式;</p></li><li><p>参考<a href="https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/" target="_blank" rel="noopener">Ubuntu 软件仓库镜像使用帮助</a>切换清华源,务必切换源,否则无法安装ddrescue;</p></li><li><p>安装ddrescue:<code>sudo apt install gddrescue</code></p></li><li><p>使用<code>sudo fdisk -l</code>查看硬盘信息,通过硬盘分区情况,或者也可以进入文件管理查看有无文件来区分新旧硬盘,假设旧硬盘为<code>/dev/sda</code>,新硬盘为<code>/dev/sdb</code>;</p></li><li><p>克隆硬盘:<code>sudo ddrescue -v --force /dev/sda /dev/sdb</code>。</p></li></ol><h1 id="扩容分区">扩容分区</h1><p>进入主菜单,搜索GParted即可找到软件,图形界面操作,比较简单,不赘述。</p>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> Ubuntu </tag>
<tag> ddrescue </tag>
<tag> GParted </tag>
<tag> 硬盘克隆 </tag>
</tags>
</entry>
<entry>
<title>Openpai v1.8.1 部署总结</title>
<link href="2022/12/11/p63.html"/>
<url>2022/12/11/p63.html</url>
<content type="html"><![CDATA[<p><strong><em>Note: 微软官方已经停止维护此项目,目前我在<a href="https://github.com/siaimes/pai" target="_blank" rel="noopener">我自己的分支</a>维护一些我自己需要的功能,也欢迎PR。</em></strong></p><h1 id="前提">前提</h1><p>虽然我保留了v1.8.1这个release,但实际我已经添加了许多功能和bug fix,所以和microsoft发布的v1.8.1版本会有一些区别,具体可参考<a href="https://github.com/siaimes/pai/pulls?q=is%3Apr" target="_blank" rel="noopener">https://github.com/siaimes/pai/pulls?q=is%3Apr</a>。</p><h1 id="术语">术语</h1><table><thead><tr class="header"><th style="text-align: center;">术语</th><th style="text-align: center;">解释</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">dev-box</td><td style="text-align: center;">集群安装、管理与维护节点,不用的时候可以关机,请不要格式化,否则再也无法维护集群</td></tr><tr class="even"><td style="text-align: center;">master</td><td style="text-align: center;">集群主节点</td></tr><tr class="odd"><td style="text-align: center;">worker</td><td style="text-align: center;">集群计算节点</td></tr><tr class="even"><td style="text-align: center;">所有机器</td><td style="text-align: center;">包括dev-box、master和worker在内的所有节点</td></tr></tbody></table><h1 id="硬件要求">硬件要求</h1><table><thead><tr class="header"><th style="text-align: center;">节点</th><th style="text-align: center;">CPU</th><th style="text-align: center;">内存</th><th style="text-align: center;">系统盘</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">dev-box</td><td style="text-align: center;">4线程</td><td style="text-align: center;">8GB</td><td style="text-align: center;">40GB</td></tr><tr class="even"><td style="text-align: center;">master</td><td style="text-align: center;">8线程</td><td style="text-align: center;">64GB</td><td style="text-align: center;">256GB</td></tr><tr class="odd"><td style="text-align: center;">worker</td><td style="text-align: center;">6线程/GPU</td><td style="text-align: center;">64GB/GPU</td><td style="text-align: center;">512GB</td></tr></tbody></table><p><strong><em>注1:</em></strong> master,worker必须为物理机器(其实也可以是虚拟机,但是生产环境不建议这样用,还要考虑显卡直通虚拟机的问题也挺麻烦),dev-box可以虚拟机(如果要调试代码或二次开发可能要100G以上硬盘空间),毕竟他只有安装和维护系统的时候才用到,用物理机器太浪费了。</p><h1 id="前提条件">前提条件</h1><ol type="1"><li>所有机器操作系统为全新的Ubuntu 18.04 LTS server;</li><li>所有机器具有相同用户名和密码的管理员账户;</li><li>所有机器时区为上海;</li><li>所有机器IP在同一网段,一台机器多个IP也是不允许的;</li><li>此文档只支持x86_84架构的机器。</li></ol><p><strong><em>注1:</em></strong> Openpai项目文档建议用Ubuntu 16.04 LTS,但是我在Ubuntu 16.04 LTS遇到了很严重的<a href="https://github.com/microsoft/pai/issues/5316" target="_blank" rel="noopener">问题</a>,所以更新到Ubuntu 18.04。</p><p><strong><em>注2:</em></strong>k8s不支持swap,安装系统的时候请不要添加swap分区,否则节点重启就挂了。</p><h1 id="初始化">初始化</h1><h2 id="所有机器">所有机器</h2><h3 id="关闭自动更新">关闭自动更新</h3><p>Ubuntu 18.04 LTS server默认是开启自动更新的,需要关闭,以避免不必要的系统故障。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/apt/apt.conf.d/20auto-upgrades</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">APT::Periodic::Update-Package-Lists <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Download-Upgradeable-Packages <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::AutocleanInterval <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Unattended-Upgrade <span class="string">"0"</span>;</span><br></pre></td></tr></table></figure><p>参考:<a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">Prevent Ubuntu 18.06 And Nvidia Drivers From Updating</a></p><h3 id="切换清华源">切换清华源</h3><p><a href="https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/" target="_blank" rel="noopener">https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/</a></p><h3 id="更新依赖">更新依赖</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h3 id="启动bbr可选">启动BBR(可选)</h3><p>拥塞控制算法,比默认的更好。</p><p>查看:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><p>设置:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo bash -c <span class="string">'echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo bash -c <span class="string">'echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo sysctl -p</span><br></pre></td></tr></table></figure><p>验证:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><h3 id="安装openssh-server">安装openssh-server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install openssh-server</span><br></pre></td></tr></table></figure><h3 id="安装ntp">安装ntp</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install ntp</span><br></pre></td></tr></table></figure><h3 id="安装docker">安装docker</h3><p><a href="https://docs.docker.com/engine/install/ubuntu/" target="_blank" rel="noopener">https://docs.docker.com/engine/install/ubuntu/</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt remove docker docker-engine docker.io containerd runc</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt-get install \</span><br><span class="line"> apt-transport-https \</span><br><span class="line"> ca-certificates \</span><br><span class="line"> curl \</span><br><span class="line"> gnupg \</span><br><span class="line"> lsb-release</span><br><span class="line">curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> \</span><br><span class="line"> <span class="string">"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \</span></span><br><span class="line"><span class="string"> <span class="variable">$(lsb_release -cs)</span> stable"</span> | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt install docker-ce docker-ce-cli containerd.io</span><br></pre></td></tr></table></figure><p>删除添加的docker源,因为安装Openpai的时候,Openpai的脚本会再添加一次docker源,会导致源冲突。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo rm /etc/apt/sources.list.d/docker.list</span><br></pre></td></tr></table></figure><h3 id="安装python">安装python</h3><p>按理说Ubuntu默认是有安装python的,但是我在实际操作的时候发现有的机器就是没有python,所以手动确认以下,以避免报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install python</span><br></pre></td></tr></table></figure><h3 id="安装nfs客户端">安装nfs客户端</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install nfs-common</span><br></pre></td></tr></table></figure><h3 id="安装压缩软件">安装压缩软件</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install zip</span><br></pre></td></tr></table></figure><h3 id="修改daemon">修改daemon</h3><p>在所有机器的<code>/etc/docker/daemon.json</code>中写入<code>{}</code>,可以避免不必要的报错。</p><p>如果您局域网地址是<code>172.*.*.*</code>,请参考这个文档<a href="/2021/04/29/p55.html">Docker-Compose导致ssh连接断开的问题</a>,修改<code>/etc/docker/daemon.json</code>,否则可能出现奇怪的网络问题。即使用下面的daemon文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>修改完daemon之后重启一下docker:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h2 id="worker">worker</h2><p><strong><em>注1:</em></strong> 下面介绍的是使用NVIDIA显卡的worker节点配置方法,如果使用AMD显卡请自己参考项目文档,由于我没有ADM显卡,所以无法测试。</p><p><strong><em>注2:</em></strong> 如果是CPU计算节点可以跳过这一部分。</p><h3 id="安装gpu驱动">安装GPU驱动</h3><p><a href="https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa" target="_blank" rel="noopener">https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed</a></p><p><a href="https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove" target="_blank" rel="noopener">https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove</a></p><p><a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/</a> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-driver-470</span><br><span class="line">sudo apt autoremove xserver-xorg</span><br><span class="line">sudo apt autoremove --purge xserver-xorg</span><br><span class="line">sudo apt-mark hold nvidia-driver-470 <span class="comment"># Freeze NVIDIA Drivers</span></span><br></pre></td></tr></table></figure></p><p>如果上面第一条命令不成功可以用下面这条替代: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo bash -c <span class="string">'echo "deb https://launchpad.proxy.ustclug.org/graphics-drivers/ppa/ubuntu bionic main" > /etc/apt/sources.list.d/graphics-drivers-ubuntu-ppa-bionic.list'</span></span><br></pre></td></tr></table></figure></p><h3 id="安装nvidia-container-runtime">安装nvidia-container-runtime</h3><p><a href="https://github.com/NVIDIA/nvidia-container-runtime#installation" target="_blank" rel="noopener">https://github.com/NVIDIA/nvidia-container-runtime#installation</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime</a></p><p>先用这个命令<code>ping</code>一下<code>nvidia.github.io</code>看看返回的IP地址是否正常。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ping nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>我的网络环境这个域名已经被DNS污染了(DNS返回127.0.0.1),我的解决方案是在<code>/etc/hosts</code>文件中添加一行如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">185.199.110.153 nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>然后继续安装即可</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \</span><br><span class="line"> sudo apt-key add -</span><br><span class="line">distribution=$(. /etc/os-release;<span class="built_in">echo</span> <span class="variable">$ID</span><span class="variable">$VERSION_ID</span>)</span><br><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/<span class="variable">$distribution</span>/nvidia-container-runtime.list | \</span><br><span class="line"> sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-container-runtime</span><br><span class="line">sudo nano /etc/docker/daemon.json</span><br></pre></td></tr></table></figure><p>填入下面的配置文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>如果有按照上一部分修改daemon,配置文件应该如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>最后重启docker。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h3 id="设置gpu常驻内存">设置GPU常驻内存</h3><p><a href="https://linuxeye.com/463.html" target="_blank" rel="noopener">https://linuxeye.com/463.html</a></p><p>设置GPU常驻内存,可将<code>nvidia-smi -pm 1</code>加入<code>/etc/rc.local</code>,重启生效,可解决GPU初始化缓慢、无任务运行但是利用率居高不下、偶尔丢卡等问题。</p><p>Ubuntu 18.04 LTS的rc.local逻辑已经改变了,需要自己调整一下。</p><ol type="1"><li><code>/lib/systemd/system/rc-local.service</code>文件新增以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br><span class="line">Alias=rc-local.service</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>设置rc-local开机自启:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl <span class="built_in">enable</span> rc-local</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>在<code>/etc/rc.local</code>中填入以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/bin/sh -e</span></span><br><span class="line"></span><br><span class="line">nvidia-smi -pm 1</span><br><span class="line"></span><br><span class="line"><span class="built_in">exit</span> 0</span><br></pre></td></tr></table></figure><ol start="4" type="1"><li>赋予可执行权限:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo chmod +x /etc/rc.local</span><br></pre></td></tr></table></figure><h3 id="测试驱动是否正常">测试驱动是否正常</h3><ol type="1"><li>重启一下服务器。</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo reboot</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>检查显卡驱动以及常驻内存是否生效</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvidia-smi</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>能正确返回信息即驱动安装成功,<code>Persistence-M</code>的状态为<code>On</code>,则常驻内存配置成功。</li></ol><p>检查<code>nvidia-container-runtime</code>是否安装成功。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run nvidia/cuda:10.0-base nvidia-smi</span><br></pre></td></tr></table></figure><h3 id="处理容器内nvidia-smi无法显示占用gpu进程问题">处理容器内nvidia-smi无法显示占用GPU进程问题</h3><p><a href="https://github.com/microsoft/pai/issues/2001#issuecomment-862346186" target="_blank" rel="noopener">nvidia-smi can not detect the PID of processes using GPU</a></p><h2 id="dev-box">dev-box</h2><h3 id="配置免密登录">配置免密登录</h3><ol type="1"><li>生成密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>向远程主机注册密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-copy-id username@remote_host</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>测试是否可免密登录</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh username@remote_host</span><br></pre></td></tr></table></figure><h1 id="开始安装">开始安装</h1><p>下面都在dev-box中操作</p><p><a href="https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html" target="_blank" rel="noopener">https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html</a></p><h2 id="准备项目">准备项目</h2><p>无法clone github项目的,可以为git开一个代理,或者其他电脑clone下来之后上传,这里应该是目前唯一可能遭遇网络问题的地方。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> -b release-2.11 https://github.com/kubernetes-sigs/kubespray.git <span class="variable">${HOME}</span>/pai-deploy/kubespray</span><br><span class="line">git <span class="built_in">clone</span> https://github.com/siaimes/pai.git</span><br><span class="line"><span class="built_in">cd</span> pai</span><br><span class="line">git checkout v1.8.1</span><br></pre></td></tr></table></figure><p>将<a href="https://github.com/siaimes/pai/blob/master/contrib/kubespray/script/environment.sh#L18-L23" target="_blank" rel="noopener"><code>contrib/kubespray/script/environment.sh#L18-L23</code></a>:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="string">"Clone kubespray source code from github to <span class="variable">${HOME}</span>/pai-deploy"</span></span><br><span class="line">sudo rm -rf <span class="variable">${HOME}</span>/pai-deploy/kubespray</span><br><span class="line">git <span class="built_in">clone</span> -b release-2.11 https://github.com/kubernetes-sigs/kubespray.git <span class="variable">${HOME}</span>/pai-deploy/kubespray</span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Copy inventory folder, and save it "</span></span><br><span class="line">cp -rfp <span class="variable">${HOME}</span>/pai-deploy/kubespray/inventory/sample <span class="variable">${HOME}</span>/pai-deploy/kubespray/inventory/pai</span><br></pre></td></tr></table></figure><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#echo "Clone kubespray source code from github to ${HOME}/pai-deploy"</span></span><br><span class="line"><span class="comment">#sudo rm -rf ${HOME}/pai-deploy/kubespray</span></span><br><span class="line"><span class="comment">#git clone -b release-2.11 https://github.com/kubernetes-sigs/kubespray.git ${HOME}/pai-deploy/kubespray</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Copy inventory folder, and save it "</span></span><br><span class="line">rm -rf <span class="variable">${HOME}</span>/pai-deploy/kubespray/inventory/pai</span><br><span class="line">cp -rfp <span class="variable">${HOME}</span>/pai-deploy/kubespray/inventory/sample <span class="variable">${HOME}</span>/pai-deploy/kubespray/inventory/pai</span><br></pre></td></tr></table></figure></p><h2 id="离线安装相关文件准备">离线安装相关文件准备</h2><p><strong><em>参考:</em></strong></p><p><a href="https://github.com/siaimes/k8s-share" target="_blank" rel="noopener">https://github.com/siaimes/k8s-share</a></p><p><a href="https://github.com/microsoft/pai/issues/5150" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5150</a></p><h3 id="启动服务容器">启动服务容器</h3><p>在dev-box运行如下命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run -itd -p 0.0.0.0:10000:80 --restart always --name k8s_share siaimes/k8s-share:v1.8.1</span><br></pre></td></tr></table></figure><p>该命令会在dev-box启动一个容器以提供安装PAI需要的资源文件,现在假设你dev-box的IP为10.10.10.10,后面有提到这个地址的地方自己对照修改。</p><h3 id="修改安装脚本">修改安装脚本</h3><p>将 <a href="https://github.com/microsoft/pai/blob/master/src/device-plugin/deploy/start.sh.template#L32" target="_blank" rel="noopener"><pai-code-dir>/src/device-plugin/deploy/start.sh.template#L32</pai-code-dir></a> 由 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">svn cat https://github.com/NVIDIA/k8s-device-plugin.git/tags/1.0.0-beta4/nvidia-device-plugin.yml \</span><br></pre></td></tr></table></figure></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl <span class="string">"http://10.10.10.10:10000/k8s-share/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml"</span> \</span><br></pre></td></tr></table></figure></p><p>即将<code>nvidia-device-plugin.yml</code>资源文件下载地址改为我们自己启动的服务地址。</p><h2 id="编写参数文件">编写参数文件</h2><p>参考项目文档编写<code>layout.yaml</code>文件。</p><p>参考下面的格式编写<code>config.yaml</code>文件。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">user: username</span><br><span class="line">password: password</span><br><span class="line">docker_registry_namespace: siaimes</span><br><span class="line">docker_image_tag: v1.8.1</span><br><span class="line"></span><br><span class="line">openpai_kubespray_extra_var:</span><br><span class="line"> kube_image_repo: <span class="string">"siaimes"</span></span><br><span class="line"> gcr_image_repo: <span class="string">"siaimes"</span></span><br><span class="line"> pod_infra_image_repo: <span class="string">"siaimes/pause-{{ image_arch }}"</span></span><br><span class="line"> dnsautoscaler_image_repo: <span class="string">"siaimes/cluster-proportional-autoscaler-{{ image_arch }}"</span></span><br><span class="line"> kubeadm_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/kubernetes-release/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"</span></span><br><span class="line"> hyperkube_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/hyperkube"</span></span><br><span class="line"> cni_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"</span></span><br><span class="line"> calicoctl_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"</span></span><br><span class="line"></span><br><span class="line">kubelet_custom_flags:</span><br><span class="line"> serialize-image-pulls: <span class="string">"false"</span></span><br></pre></td></tr></table></figure><h2 id="安装">安装</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> contrib/kubespray</span><br><span class="line">/bin/bash quick-start-kubespray.sh</span><br><span class="line">/bin/bash quick-start-service.sh</span><br></pre></td></tr></table></figure><h2 id="重启服务注意事项">重启服务注意事项</h2><p><a href="https://github.com/microsoft/pai/issues/5465" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5465</a></p><p><a href="https://github.com/microsoft/pai/blob/master/docs/manual/cluster-admin/basic-management-operations.md" target="_blank" rel="noopener">项目文档</a>说管理集群要使用/pai/paictl.py,但是该文件默认checkout的是最终版本而非当前版本,所以会导致版本不一致问题。</p><p>如果要使用这个脚本的话,每次启动容器都要手动checkout一下,否则可能导致服务无法正常启动。</p><p>目前的解决方案是将本地checkout好的项目挂载到容器里面然后直接用那里面的paictl.py。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run -itd \</span><br><span class="line"> -e COLUMNS=<span class="variable">$COLUMNS</span> -e LINES=<span class="variable">$LINES</span> -e TERM=<span class="variable">$TERM</span> \</span><br><span class="line"> -v /var/run/docker.sock:/var/run/docker.sock \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg:/cluster-configuration \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/kube:/root/.kube \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai:/mnt/pai \</span><br><span class="line"> --pid=host \</span><br><span class="line"> --privileged=<span class="literal">true</span> \</span><br><span class="line"> --net=host \</span><br><span class="line"> --name=dev-box \</span><br><span class="line"> --restart=always \</span><br><span class="line"> siaimes/dev-box:v1.8.1</span><br></pre></td></tr></table></figure><p>运行这个命令前确保<code>${HOME}/pai</code>中检出版本与安装版本一致,然后容器中使用<code>/mnt/pai</code>中的<code>paictl.py</code>。</p><h1 id="后续">后续</h1><p>成功启动service之后,其他个性化配置过程一般不会有问题,除非配置文件写错了,务必检查清楚。</p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> PyCharm </tag>
<tag> microsoft </tag>
<tag> openpai </tag>
<tag> vs code </tag>
<tag> Deep Learning </tag>
<tag> cluster </tag>
<tag> Ubuntu 18.04 </tag>
</tags>
</entry>
<entry>
<title>Openpai v1.8.0部署总结 —— 新版</title>
<link href="2022/05/30/p62.html"/>
<url>2022/05/30/p62.html</url>
<content type="html"><![CDATA[<p><strong><em>过时了,请使用<a href="/2022/12/11/p63.html">更新后的文档</a></em></strong></p><h1 id="术语">术语</h1><table><thead><tr class="header"><th style="text-align: center;">术语</th><th style="text-align: center;">解释</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">dev-box</td><td style="text-align: center;">集群安装、管理与维护节点,不用的时候可以关机,请不要格式化,否则再也无法维护集群</td></tr><tr class="even"><td style="text-align: center;">master</td><td style="text-align: center;">集群主节点</td></tr><tr class="odd"><td style="text-align: center;">worker</td><td style="text-align: center;">集群计算节点</td></tr><tr class="even"><td style="text-align: center;">所有机器</td><td style="text-align: center;">包括dev-box、master和worker在内的所有节点</td></tr></tbody></table><h1 id="硬件要求">硬件要求</h1><table><thead><tr class="header"><th style="text-align: center;">节点</th><th style="text-align: center;">CPU</th><th style="text-align: center;">内存</th><th style="text-align: center;">系统盘</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">dev-box</td><td style="text-align: center;">4线程</td><td style="text-align: center;">8GB</td><td style="text-align: center;">40GB</td></tr><tr class="even"><td style="text-align: center;">master</td><td style="text-align: center;">8线程</td><td style="text-align: center;">64GB</td><td style="text-align: center;">256GB</td></tr><tr class="odd"><td style="text-align: center;">worker</td><td style="text-align: center;">6线程/GPU</td><td style="text-align: center;">64GB/GPU</td><td style="text-align: center;">512GB</td></tr></tbody></table><p><strong><em>注1:</em></strong> master,worker必须为物理机器(其实也可以是虚拟机,但是生产环境不建议这样用,还要考虑显卡直通虚拟机的问题也挺麻烦),dev-box可以虚拟机(如果要调试代码或二次开发可能要100G以上硬盘空间),毕竟他只有安装和维护系统的时候才用到,用物理机器太浪费了。</p><h1 id="前提条件">前提条件</h1><ol type="1"><li>所有机器操作系统为全新的Ubuntu 18.04 LTS server;</li><li>所有机器具有相同用户名和密码的管理员账户;</li><li>所有机器时区为上海;</li><li>所有机器IP在同一网段,一台机器多个IP也是不允许的;</li><li>此文档只支持x86_84架构的机器。</li></ol><p><strong><em>注1:</em></strong> Openpai项目文档建议用Ubuntu 16.04 LTS,但是我在Ubuntu 16.04 LTS遇到了很严重的<a href="https://github.com/microsoft/pai/issues/5316" target="_blank" rel="noopener">问题</a>,所以更新到Ubuntu 18.04。</p><p><strong><em>注2:</em></strong>k8s不支持swap,安装系统的时候请不要添加swap分区,否则节点重启就挂了。</p><h1 id="初始化">初始化</h1><h2 id="所有机器">所有机器</h2><h3 id="关闭自动更新">关闭自动更新</h3><p>Ubuntu 18.04 LTS server默认是开启自动更新的,需要关闭,以避免不必要的系统故障。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/apt/apt.conf.d/20auto-upgrades</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">APT::Periodic::Update-Package-Lists <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Download-Upgradeable-Packages <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::AutocleanInterval <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Unattended-Upgrade <span class="string">"0"</span>;</span><br></pre></td></tr></table></figure><p>参考:<a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">Prevent Ubuntu 18.06 And Nvidia Drivers From Updating</a></p><h3 id="切换清华源">切换清华源</h3><p><a href="https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/" target="_blank" rel="noopener">https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/</a></p><h3 id="更新依赖">更新依赖</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h3 id="启动bbr可选">启动BBR(可选)</h3><p>拥塞控制算法,比默认的更好。</p><p>查看:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><p>设置:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo bash -c <span class="string">'echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo bash -c <span class="string">'echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo sysctl -p</span><br></pre></td></tr></table></figure><p>验证:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><h3 id="安装openssh-server">安装openssh-server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install openssh-server</span><br></pre></td></tr></table></figure><h3 id="安装ntp">安装ntp</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install ntp</span><br></pre></td></tr></table></figure><h3 id="安装docker">安装docker</h3><p><a href="https://docs.docker.com/engine/install/ubuntu/" target="_blank" rel="noopener">https://docs.docker.com/engine/install/ubuntu/</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt remove docker docker-engine docker.io containerd runc</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt-get install \</span><br><span class="line"> apt-transport-https \</span><br><span class="line"> ca-certificates \</span><br><span class="line"> curl \</span><br><span class="line"> gnupg \</span><br><span class="line"> lsb-release</span><br><span class="line">curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> \</span><br><span class="line"> <span class="string">"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \</span></span><br><span class="line"><span class="string"> <span class="variable">$(lsb_release -cs)</span> stable"</span> | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt install docker-ce docker-ce-cli containerd.io</span><br></pre></td></tr></table></figure><p>删除添加的docker源,因为安装Openpai的时候,Openpai的脚本会再添加一次docker源,会导致源冲突。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo rm /etc/apt/sources.list.d/docker.list</span><br></pre></td></tr></table></figure><h3 id="安装python">安装python</h3><p>按理说Ubuntu默认是有安装python的,但是我在实际操作的时候发现有的机器就是没有python,所以手动确认以下,以避免报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install python</span><br></pre></td></tr></table></figure><h3 id="安装nfs客户端">安装nfs客户端</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install nfs-common</span><br></pre></td></tr></table></figure><h3 id="安装压缩软件">安装压缩软件</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install zip</span><br></pre></td></tr></table></figure><h3 id="修改daemon">修改daemon</h3><p>在所有机器的<code>/etc/docker/daemon.json</code>中写入<code>{}</code>,可以避免不必要的报错。</p><p>如果您局域网地址是<code>172.*.*.*</code>,请参考这个文档<a href="/2021/04/29/p55.html">Docker-Compose导致ssh连接断开的问题</a>,修改<code>/etc/docker/daemon.json</code>,否则可能出现奇怪的网络问题。即使用下面的daemon文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>修改完daemon之后重启一下docker:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h2 id="worker">worker</h2><p><strong><em>注1:</em></strong> 下面介绍的是使用NVIDIA显卡的worker节点配置方法,如果使用AMD显卡请自己参考项目文档,由于我没有ADM显卡,所以无法测试。</p><p><strong><em>注2:</em></strong> 如果是CPU计算节点可以跳过这一部分。</p><h3 id="安装gpu驱动">安装GPU驱动</h3><p><a href="https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa" target="_blank" rel="noopener">https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed</a></p><p><a href="https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove" target="_blank" rel="noopener">https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove</a></p><p><a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/</a> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-driver-470</span><br><span class="line">sudo apt autoremove xserver-xorg</span><br><span class="line">sudo apt autoremove --purge xserver-xorg</span><br><span class="line">sudo apt-mark hold nvidia-driver-470 <span class="comment"># Freeze NVIDIA Drivers</span></span><br></pre></td></tr></table></figure></p><h3 id="安装nvidia-container-runtime">安装nvidia-container-runtime</h3><p><a href="https://github.com/NVIDIA/nvidia-container-runtime#installation" target="_blank" rel="noopener">https://github.com/NVIDIA/nvidia-container-runtime#installation</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime</a></p><p>先用这个命令<code>ping</code>一下<code>nvidia.github.io</code>看看返回的IP地址是否正常。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ping nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>我的网络环境这个域名已经被DNS污染了(DNS返回127.0.0.1),我的解决方案是在<code>/etc/hosts</code>文件中添加一行如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">185.199.110.153 nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>然后继续安装即可</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \</span><br><span class="line"> sudo apt-key add -</span><br><span class="line">distribution=$(. /etc/os-release;<span class="built_in">echo</span> <span class="variable">$ID</span><span class="variable">$VERSION_ID</span>)</span><br><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/<span class="variable">$distribution</span>/nvidia-container-runtime.list | \</span><br><span class="line"> sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-container-runtime</span><br><span class="line">sudo nano /etc/docker/daemon.json</span><br></pre></td></tr></table></figure><p>填入下面的配置文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>如果有按照上一部分修改daemon,配置文件应该如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>最后重启docker。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h3 id="设置gpu常驻内存">设置GPU常驻内存</h3><p><a href="https://linuxeye.com/463.html" target="_blank" rel="noopener">https://linuxeye.com/463.html</a></p><p>设置GPU常驻内存,可将<code>nvidia-smi -pm 1</code>加入<code>/etc/rc.local</code>,重启生效,可解决GPU初始化缓慢、无任务运行但是利用率居高不下、偶尔丢卡等问题。</p><p>Ubuntu 18.04 LTS的rc.local逻辑已经改变了,需要自己调整一下。</p><ol type="1"><li><code>/lib/systemd/system/rc-local.service</code>文件新增以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br><span class="line">Alias=rc-local.service</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>设置rc-local开机自启:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl <span class="built_in">enable</span> rc-local</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>在<code>/etc/rc.local</code>中填入以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/bin/sh -e</span></span><br><span class="line"></span><br><span class="line">nvidia-smi -pm 1</span><br><span class="line"></span><br><span class="line"><span class="built_in">exit</span> 0</span><br></pre></td></tr></table></figure><ol start="4" type="1"><li>赋予可执行权限:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo chmod +x /etc/rc.local</span><br></pre></td></tr></table></figure><h3 id="测试驱动是否正常">测试驱动是否正常</h3><ol type="1"><li>重启一下服务器。</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo reboot</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>检查显卡驱动以及常驻内存是否生效</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvidia-smi</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>能正确返回信息即驱动安装成功,<code>Persistence-M</code>的状态为<code>On</code>,则常驻内存配置成功。</li></ol><p>检查<code>nvidia-container-runtime</code>是否安装成功。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run nvidia/cuda:10.0-base nvidia-smi</span><br></pre></td></tr></table></figure><h3 id="处理容器内nvidia-smi无法显示占用gpu进程问题">处理容器内nvidia-smi无法显示占用GPU进程问题</h3><p><a href="https://github.com/microsoft/pai/issues/2001#issuecomment-862346186" target="_blank" rel="noopener">nvidia-smi can not detect the PID of processes using GPU</a></p><h2 id="dev-box">dev-box</h2><h3 id="配置免密登录">配置免密登录</h3><ol type="1"><li>生成密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>向远程主机注册密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-copy-id username@remote_host</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>测试是否可免密登录</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh username@remote_host</span><br></pre></td></tr></table></figure><h1 id="开始安装">开始安装</h1><p>下面都在dev-box中操作</p><p><a href="https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html" target="_blank" rel="noopener">https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html</a></p><h2 id="准备项目">准备项目</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/microsoft/pai.git</span><br><span class="line"><span class="built_in">cd</span> pai</span><br><span class="line">git checkout v1.8.0</span><br></pre></td></tr></table></figure><h2 id="修改部分文件">修改部分文件</h2><p><strong><em>参考:</em></strong></p><p><a href="https://github.com/microsoft/pai/issues/5445" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5445</a></p><p><a href="https://github.com/microsoft/pai/issues/5568" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5568</a></p><p><a href="https://github.com/microsoft/pai/pull/5639" target="_blank" rel="noopener">https://github.com/microsoft/pai/pull/5639</a></p><p><a href="https://github.com/microsoft/pai/pull/5777" target="_blank" rel="noopener">https://github.com/microsoft/pai/pull/5777</a></p><p><a href="https://github.com/microsoft/pai/issues/5503" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5503</a></p><p><strong><em>注</em></strong>:下面所有命令中<code><pai-code-dir></code>需要替换为您的pai项目所在路径。</p><ol type="1"><li>注释掉<a href="https://github.com/microsoft/pai/blob/master/contrib/kubespray/quick-start-kubespray.sh#L60" target="_blank" rel="noopener"><code><pai-code-dir>/contrib/kubespray/quick-start-kubespray.sh#L60</code></a>:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ansible-playbook -i <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || <span class="built_in">exit</span> $?</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>将<a href="https://github.com/microsoft/pai/blob/master/contrib/kubespray/script/environment.sh#L50" target="_blank" rel="noopener"><code><pai-code-dir>/contrib/kubespray/script/environment.sh#L50</code></a>:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo python3 -m pip install ansible==2.9.7</span><br></pre></td></tr></table></figure><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo python3 -m pip install ansible==2.9.24</span><br></pre></td></tr></table></figure></p><ol start="3" type="1"><li><strong><em>可选,用户配额功能</em></strong></li></ol><p>将<a href="https://github.com/microsoft/pai/blob/master/src/hivedscheduler/deploy/hivedscheduler.yaml.template#L39" target="_blank" rel="noopener"><code><pai-code-dir>/src/hivedscheduler/deploy/hivedscheduler.yaml.template#L39</code></a></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image: siaimes/hivedscheduler:v0.3.4-hp20221011</span><br></pre></td></tr></table></figure></p><p>将<a href="https://github.com/microsoft/pai/blob/master/src/webportal/deploy/webportal.yaml.template#L36" target="_blank" rel="noopener"><code><pai-code-dir>/src/webportal/deploy/webportal.yaml.template#L36</code></a></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image: siaimes/webportal:v1.8.0-hp20221011</span><br></pre></td></tr></table></figure></p><p>将<a href="https://github.com/microsoft/pai/blob/master/src/rest-server/deploy/rest-server.yaml.template#L36" target="_blank" rel="noopener"><code><pai-code-dir>/src/rest-server/deploy/rest-server.yaml.template#L36</code></a></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image: siaimes/rest-server:v1.8.0-hp20221011</span><br></pre></td></tr></table></figure></p><h2 id="离线安装相关文件准备">离线安装相关文件准备</h2><p><strong><em>参考:</em></strong></p><p><a href="https://github.com/siaimes/k8s-share" target="_blank" rel="noopener">https://github.com/siaimes/k8s-share</a></p><p><a href="https://github.com/microsoft/pai/issues/5150" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5150</a></p><h3 id="启动服务容器">启动服务容器</h3><p>在dev-box运行如下命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run -itd -p 0.0.0.0:10000:80 --restart always --name k8s_share siaimes/k8s-share:v1.8.0</span><br></pre></td></tr></table></figure><p>该命令会在dev-box启动一个容器以提供安装PAI需要的资源文件,现在假设你dev-box的IP为10.10.10.10,后面有提到这个地址的地方自己对照修改。</p><h3 id="修改安装脚本">修改安装脚本</h3><p>将 <a href="https://github.com/microsoft/pai/blob/master/src/device-plugin/deploy/start.sh.template#L32" target="_blank" rel="noopener"><pai-code-dir>/src/device-plugin/deploy/start.sh.template#L32</pai-code-dir></a> 由 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">svn cat https://github.com/NVIDIA/k8s-device-plugin.git/tags/1.0.0-beta4/nvidia-device-plugin.yml \</span><br></pre></td></tr></table></figure></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl <span class="string">"http://10.10.10.10:10000/k8s-share/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml"</span> \</span><br></pre></td></tr></table></figure></p><p>即将<code>nvidia-device-plugin.yml</code>资源文件下载地址改为我们自己启动的服务地址。</p><h2 id="编写参数文件">编写参数文件</h2><p>参考项目文档编写<code>layout.yaml</code>文件。</p><p>参考下面的格式编写<code>config.yaml</code>文件。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">user: username</span><br><span class="line">password: password</span><br><span class="line">docker_image_tag: v1.8.0</span><br><span class="line"></span><br><span class="line">openpai_kubespray_extra_var:</span><br><span class="line"> kube_image_repo: <span class="string">"siaimes"</span></span><br><span class="line"> gcr_image_repo: <span class="string">"siaimes"</span></span><br><span class="line"> pod_infra_image_repo: <span class="string">"siaimes/pause-{{ image_arch }}"</span></span><br><span class="line"> dnsautoscaler_image_repo: <span class="string">"siaimes/cluster-proportional-autoscaler-{{ image_arch }}"</span></span><br><span class="line"> kubeadm_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/kubernetes-release/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"</span></span><br><span class="line"> hyperkube_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/kubernetes-release/release/{{ kube_version }}/bin/linux/{{ image_arch }}/hyperkube"</span></span><br><span class="line"> cni_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"</span></span><br><span class="line"> calicoctl_download_url: <span class="string">"http://10.10.10.10:10000/k8s-share/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"</span></span><br><span class="line"></span><br><span class="line">kubelet_custom_flags:</span><br><span class="line"> serialize-image-pulls: <span class="string">"false"</span></span><br></pre></td></tr></table></figure><h2 id="安装">安装</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> contrib/kubespray</span><br><span class="line">/bin/bash quick-start-kubespray.sh</span><br><span class="line">/bin/bash quick-start-service.sh</span><br></pre></td></tr></table></figure><h2 id="重启服务注意事项">重启服务注意事项</h2><p><a href="https://github.com/microsoft/pai/issues/5465" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5465</a></p><p><a href="https://github.com/microsoft/pai/blob/master/docs/manual/cluster-admin/basic-management-operations.md" target="_blank" rel="noopener">项目文档</a>说管理集群要使用/pai/paictl.py,但是该文件默认checkout的是最终版本而非当前版本,所以会导致版本不一致问题。</p><p>如果要使用这个脚本的话,每次启动容器都要手动checkout一下,否则可能导致服务无法正常启动。</p><p>目前的解决方案是将本地checkout好的项目挂载到容器里面然后直接用那里面的paictl.py。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run -itd \</span><br><span class="line"> -e COLUMNS=<span class="variable">$COLUMNS</span> -e LINES=<span class="variable">$LINES</span> -e TERM=<span class="variable">$TERM</span> \</span><br><span class="line"> -v /var/run/docker.sock:/var/run/docker.sock \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg:/cluster-configuration \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/kube:/root/.kube \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai:/mnt/pai \</span><br><span class="line"> --pid=host \</span><br><span class="line"> --privileged=<span class="literal">true</span> \</span><br><span class="line"> --net=host \</span><br><span class="line"> --name=dev-box \</span><br><span class="line"> --restart=always \</span><br><span class="line"> openpai/dev-box:v1.8.0</span><br></pre></td></tr></table></figure><p>运行这个命令前确保<code>${HOME}/pai</code>中检出版本与安装版本一致,然后容器中使用<code>/mnt/pai</code>中的<code>paictl.py</code>。</p><h1 id="后续">后续</h1><p>成功启动service之后,其他个性化配置过程一般不会有问题,除非配置文件写错了,务必检查清楚。</p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> PyCharm </tag>
<tag> microsoft </tag>
<tag> openpai </tag>
<tag> vs code </tag>
<tag> Deep Learning </tag>
<tag> cluster </tag>
<tag> Ubuntu 18.04 </tag>
</tags>
</entry>
<entry>
<title>PyTorch DistributedDataParallel训练踩坑记录</title>
<link href="2022/03/08/p62.html"/>
<url>2022/03/08/p62.html</url>
<content type="html"><![CDATA[<p>问题所在:当需要用DDP同时训练多个模型时,正常情况下将每个模型用DistributedDataParallel类包裹一下即可。当时我想着这样挺麻烦的,就想了一个取巧的办法:我用<a href="https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.add_module" target="_blank" rel="noopener">torch.nn.Module.add_module</a>将所有模型注册成为一个大模型,然后以整体一次传递给DistributedDataParallel。</p><p>这样是不是很完美?但是我自己感觉这样会有问题,所以我没忘记做测试。如何测试呢,当然是将网络部分参数打印出来,不同进程的同一个参数值应该是一模一样的。然而现实狠狠的给了我一巴掌,不同进程打印出来的结果除了第一次(刚初始化完)外完全不一样。(其实可以直接观察两个进程的输出日志是否同步来判断。如果每一个iter都是同步的,那么他们数据应该是同步的。而如果有的进程快,有的进程慢,那么他们的数据肯定没同步。因为DistributedDataParallel多个进程之间每一个loss.backward()都会进行数据同步。)</p><p>因为这个项目挺大的,我一直没以为是这个原因导致的,所以我DeBug了两天都没有解决,最后是怎么找到问题的呢?</p><p>我将<a href="https://github.com/pytorch/examples/tree/main/imagenet" target="_blank" rel="noopener">pytorch examples</a>下载下来运行确认该代码多个进程之间数据是同步的,然后将我的代码结构和参数往这边靠,我将所有可调的参数都靠过去了我的模型参数还是没有在多个进程间同步。这时我就开始将我的代码中的复杂的模块用这个样例中的简单的模块一一替换,包括data_loader, optimizer, models和迭代过程等...。最后的结果是替换了模型之后我的数据就同步了,由此才定位到了应该是上面采用的技巧出的问题。</p><p>此时我想起之前浏览<a href="https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html" target="_blank" rel="noopener">DDP的API</a>的时候闪现过去的一个warning</p><figure><img src="/2022/03/08/p62/ddp_warning.jpg" alt><figcaption>DDP_warning</figcaption></figure><p>别说了,都是泪!!!</p><p>DDP坑太多了,比如K80不支持nccl后端,要改为gloo。比如本应该占用其他GPU的进程在0号GPU也占用几百兆显存导致影响最大batch-size。</p><p>就挺难!!!</p>]]></content>
<categories>
<category> Deep Learning </category>
</categories>
<tags>
<tag> PyTorch </tag>
<tag> Deep Learning </tag>
<tag> DistributedDataParallel </tag>
<tag> DDP </tag>
</tags>
</entry>
<entry>
<title>制作可同时插入 Word 和 LaTeX 的 矢量图</title>
<link href="2021/12/10/p61.html"/>
<url>2021/12/10/p61.html</url>
<content type="html"><![CDATA[<p>LaTeX 对矢量图的支持最好的格式是pdf,而 Word 对矢量图的支持最好的是 SVG 格式。</p><p>使用 PowerPoint 绘图,让图像占满整个单页的 PowerPoint 页面,导出为pdf即可直接插入到 LaTeX。如果图像没有占满整个 PowerPoint 页面,可以使用 Adobe Acrobat 裁剪一下。</p><p>PowerPoint自带的另存为工具可以将 PowerPoint 导出为 SVG 格式,不过目前支持有限,部分元素导出之后会失真。目前比较稳定的方案是,将导出的 pdf 使用 Illustrator 再次导出为 SVG 格式,就可以插入到 Word。</p><p>已知缺陷:1. 具有渐变色的 SVG 插入到 Word 之后再导出的pdf该图像会失真,所以不要在 PowerPoint 绘图时加入渐变色;2. Illustrator 导出 SVG 时不要设置透明背景,否则部分元素会丢失,目前不知原理。</p>]]></content>
<categories>
<category> 排版 </category>
</categories>
<tags>
<tag> Word </tag>
<tag> LaTeX </tag>
<tag> 矢量图 </tag>
</tags>
</entry>
<entry>
<title>Linux Matplotlib 安装 Times New Roman 字体</title>
<link href="2021/12/09/p60.html"/>
<url>2021/12/09/p60.html</url>
<content type="html"><![CDATA[<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">cd</span><br><span class="line">curl https://dl.freefontsfamily.com/download/Times-New-Roman-Font/ -o Times-New-roman.zip</span><br><span class="line"></span><br><span class="line">unzip Times-New-roman.zip</span><br><span class="line"></span><br><span class="line">python -c "import matplotlib; print(f'path to mpl-data: {matplotlib.matplotlib_fname()[:-12]}')"</span><br><span class="line"></span><br><span class="line">cp 'Times New Roman'/* /path/to/mpl-data/fonts/ttf/</span><br><span class="line"></span><br><span class="line">rm -rf ~/.cache/matplotlib</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> LInux </tag>
<tag> Python </tag>
<tag> matplotlib </tag>
<tag> Times New Roman </tag>
</tags>
</entry>
<entry>
<title>Ubuntu Apt Update失败</title>
<link href="2021/10/29/p59.html"/>
<url>2021/10/29/p59.html</url>
<content type="html"><![CDATA[<p>Ubuntu 18.04 <code>apt update</code>报错,报错信息如下:<code>Could not handshake: Error in the certificate verification.</code>。</p><p>问题根源:证书过期。</p><p>解决方案来自:<a href="https://github.com/tuna/issues/issues/1342" target="_blank" rel="noopener">https://github.com/tuna/issues/issues/1342</a>。即更换http的源,更新<code>ca-certificates</code>包,然后再换回自己的源即可。</p><ol type="1"><li>更新源:<code>/etc/apt/sources.list</code></li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">deb http://in.archive.ubuntu.com/ubuntu/ bionic main restricted universe multiverse</span><br><span class="line">deb http://in.archive.ubuntu.com/ubuntu/ bionic-updates main restricted universe multiverse</span><br><span class="line">deb http://in.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse</span><br><span class="line">deb http://security.ubuntu.com/ubuntu bionic-security main restricted universe multiverse</span><br><span class="line"></span><br><span class="line">deb-src http://in.archive.ubuntu.com/ubuntu/ bionic main restricted universe multiverse</span><br><span class="line">deb-src http://security.ubuntu.com/ubuntu bionic-security main restricted universe multiverse</span><br><span class="line">deb-src http://in.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse</span><br><span class="line">deb-src http://in.archive.ubuntu.com/ubuntu/ bionic-updates main restricted universe multiverse</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>更新<code>ca-certificates</code>包</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update && sudo apt install -y ca-certificates</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>换回自己的源,例如清华源:<code>/etc/apt/sources.list</code></li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释</span></span><br><span class="line">deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse</span><br><span class="line"><span class="comment"># deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse</span></span><br><span class="line">deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse</span><br><span class="line"><span class="comment"># deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse</span></span><br><span class="line">deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse</span><br><span class="line"><span class="comment"># deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse</span></span><br><span class="line">deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse</span><br><span class="line"><span class="comment"># deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 预发布软件源,不建议启用</span></span><br><span class="line"><span class="comment"># deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse</span></span><br><span class="line"><span class="comment"># deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse</span></span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Ubuntu </category>
</categories>
<tags>
<tag> ubuntu </tag>
<tag> apt </tag>
<tag> 18.04 </tag>
</tags>
</entry>
<entry>
<title>NFS共享的教训</title>
<link href="2021/10/11/p58.html"/>
<url>2021/10/11/p58.html</url>
<content type="html"><![CDATA[<p>Ubuntu 18.04 系统上有2个6T的硬盘,分别挂载到/home和/usr下。现在要为这两个盘做nfs共享,但是nfs共享只能共享一个文件夹,如果要共享第二个文件夹,那么第二个文件夹必须是第一个文件夹的子文件夹。</p><p>这样,我要么只能共享<code>/home/data</code>,要么只能共享<code>/usr/data</code>。</p><p>这时候就开始脑洞了:</p><ol type="1"><li><p>建立一个软连接<code>/usr/nfs-share</code>到<code>/home/data/nfs-share</code>,然后共享<code>/home/data/nfs-share</code>。实测是客户端无法挂载,提示文件不存在,所以该方案不可行。</p></li><li><p>将<code>/usr</code>所在的硬盘挂载到<code>/home/data/nfs-share</code>,实测该方案成功,客户端可正常挂载<code>/nfs-share</code>。但是存在很大的问题,这样等于客户端可以访问到服务器<code>/usr</code>下的所有文件,想想就恐怖噢,所以该方案不可行。</p></li><li><p>既然方案二可以通,那不就有方案了嘛,把<code>/usr</code>所在的硬盘挂载到<code>/home/data/nfs-share</code>,然后卸载<code>/usr</code>,这样<code>/usr</code>就隶属于根目录,而不是独立的硬盘了,再把<code>/home/data/nfs-share</code>里面的文件用<code>cp -rp</code>命令复制回去就可以了呗。说干就干,<code>sudo umount /usr</code>,<code>sudo cp -rp /home/data/nfs-share/* /usr</code>,完蛋,<code>command not found</code>,这下才想起来坏事了。所有的命令都是存放在<code>/usr/bin</code>和<code>/usr/sbin</code>里面的,你把<code>/usr</code>搞掉了,命令当然就没有了。好在,我还只是用命令行卸载了,没有修改<code>/etc/fstab</code>文件,那重启系统应该就解决了吧。太天真了嘛,所有命令都用不了,当然也不能重启拉。只能物理重启了。</p></li></ol><p><strong><em>后记:</em></strong>最终方案是使用启动盘做方案3的操作,而不是直接在原系统里面做方案3的操作。</p><p><strong><em>另一种可能:</em></strong>将不同驱动器映射到docker容器的一个目录和该目录的子目录,然后在docker容器里面做nfs共享,简直完美。</p>]]></content>
<categories>
<category> linux </category>
</categories>
<tags>
<tag> nfs </tag>
<tag> ubuntu </tag>
<tag> linux </tag>
</tags>
</entry>
<entry>
<title>Openpai v1.8.0部署总结</title>
<link href="2021/06/07/p57.html"/>
<url>2021/06/07/p57.html</url>
<content type="html"><![CDATA[<p><strong><em>过时了,请使用<a href="/2022/12/11/p63.html">更新后的文档</a></em></strong></p><h1 id="术语">术语</h1><table><thead><tr class="header"><th style="text-align: center;">术语</th><th style="text-align: center;">解释</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">dev-box</td><td style="text-align: center;">集群安装、管理与维护节点,不用的时候可以关机,请不要格式化,否则再也无法维护集群</td></tr><tr class="even"><td style="text-align: center;">master</td><td style="text-align: center;">集群主节点</td></tr><tr class="odd"><td style="text-align: center;">worker</td><td style="text-align: center;">集群计算节点</td></tr><tr class="even"><td style="text-align: center;">所有机器</td><td style="text-align: center;">包括dev-box、master和worker在内的所有节点</td></tr></tbody></table><h1 id="前提条件">前提条件</h1><ol type="1"><li>所有机器操作系统为全新的Ubuntu 18.04 LTS server;</li><li>所有机器具有相同用户名和密码的管理员账户;</li><li>所有机器时区为上海;</li><li>所有机器IP在同一网段,一台机器多个IP也是不允许的。</li></ol><p><strong><em>注1:</em></strong> Openpai项目文档建议用Ubuntu 16.04 LTS,但是我在Ubuntu 16.04 LTS遇到了很严重的<a href="https://github.com/microsoft/pai/issues/5316" target="_blank" rel="noopener">问题</a>,所以更新到Ubuntu 18.04。</p><p><strong><em>注2:</em></strong> master,worker必须为物理机器(其实也可以是虚拟机,但是生产环境不建议这样用,还要考虑显卡直通虚拟机的问题也挺麻烦),dev-box可以是硬盘空间不少于40GB的虚拟机(如果要调试代码或二次开发可能要100G以上硬盘空间),毕竟他只有安装和维护系统的时候才用到,用物理机器太浪费了。</p><p><strong><em>注3:</em></strong>k8s不支持swap,安装系统的时候请不要添加swap分区,否则节点重启就挂了。</p><h1 id="初始化">初始化</h1><h2 id="所有机器">所有机器</h2><h3 id="关闭自动更新">关闭自动更新</h3><p>Ubuntu 18.04 LTS server默认是开启自动更新的,需要关闭,以避免不必要的系统故障。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/apt/apt.conf.d/20auto-upgrades</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">APT::Periodic::Update-Package-Lists <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Download-Upgradeable-Packages <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::AutocleanInterval <span class="string">"0"</span>;</span><br><span class="line">APT::Periodic::Unattended-Upgrade <span class="string">"0"</span>;</span><br></pre></td></tr></table></figure><p>参考:<a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">Prevent Ubuntu 18.06 And Nvidia Drivers From Updating</a></p><h3 id="切换清华源">切换清华源</h3><p><a href="https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/" target="_blank" rel="noopener">https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/</a></p><h3 id="更新依赖">更新依赖</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h3 id="启动bbr可选">启动BBR(可选)</h3><p>拥塞控制算法,比默认的更好。</p><p>查看:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><p>设置:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo bash -c <span class="string">'echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo bash -c <span class="string">'echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf'</span></span><br><span class="line">sudo sysctl -p</span><br></pre></td></tr></table></figure><p>验证:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo sysctl net.ipv4.tcp_congestion_control</span><br></pre></td></tr></table></figure><h3 id="安装openssh-server">安装openssh-server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install openssh-server</span><br></pre></td></tr></table></figure><h3 id="安装ntp">安装ntp</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install ntp</span><br></pre></td></tr></table></figure><h3 id="安装docker">安装docker</h3><p><a href="https://docs.docker.com/engine/install/ubuntu/" target="_blank" rel="noopener">https://docs.docker.com/engine/install/ubuntu/</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt remove docker docker-engine docker.io containerd runc</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt-get install \</span><br><span class="line"> apt-transport-https \</span><br><span class="line"> ca-certificates \</span><br><span class="line"> curl \</span><br><span class="line"> gnupg \</span><br><span class="line"> lsb-release</span><br><span class="line">curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> \</span><br><span class="line"> <span class="string">"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \</span></span><br><span class="line"><span class="string"> <span class="variable">$(lsb_release -cs)</span> stable"</span> | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt install docker-ce docker-ce-cli containerd.io</span><br></pre></td></tr></table></figure><p>删除添加的docker源,因为安装Openpai的时候,Openpai的脚本会再添加一次docker源,会导致源冲突。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo rm /etc/apt/sources.list.d/docker.list</span><br></pre></td></tr></table></figure><h3 id="安装python">安装python</h3><p>按理说Ubuntu默认是有安装python的,但是我在实际操作的时候发现有的机器就是没有python,所以手动确认以下,以避免报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install python</span><br></pre></td></tr></table></figure><h3 id="安装nfs客户端">安装nfs客户端</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install nfs-common</span><br></pre></td></tr></table></figure><h3 id="安装压缩软件">安装压缩软件</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install zip</span><br></pre></td></tr></table></figure><h3 id="修改daemon">修改daemon</h3><p>在所有机器的<code>/etc/docker/daemon.json</code>中写入<code>{}</code>,可以避免不必要的报错。</p><p>如果您局域网地址是<code>172.*.*.*</code>,请参考这个文档<a href="/2021/04/29/p55.html">Docker-Compose导致ssh连接断开的问题</a>,修改<code>/etc/docker/daemon.json</code>,否则可能出现奇怪的网络问题。即使用下面的daemon文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>修改完daemon之后重启一下docker:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h2 id="worker">worker</h2><p><strong><em>注1:</em></strong> 下面介绍的是使用NVIDIA显卡的worker节点配置方法,如果使用AMD显卡请自己参考项目文档,由于我没有ADM显卡,所以无法测试。</p><p><strong><em>注2:</em></strong> 如果是CPU计算节点可以跳过这一部分。</p><h3 id="安装gpu驱动">安装GPU驱动</h3><p><a href="https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa" target="_blank" rel="noopener">https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed</a></p><p><a href="https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove" target="_blank" rel="noopener">https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove</a></p><p><a href="https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/" target="_blank" rel="noopener">https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/</a> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-driver-470</span><br><span class="line">sudo apt autoremove xserver-xorg</span><br><span class="line">sudo apt autoremove --purge xserver-xorg</span><br><span class="line">sudo apt-mark hold nvidia-driver-470 <span class="comment"># Freeze NVIDIA Drivers</span></span><br></pre></td></tr></table></figure></p><h3 id="安装nvidia-container-runtime">安装nvidia-container-runtime</h3><p><a href="https://github.com/NVIDIA/nvidia-container-runtime#installation" target="_blank" rel="noopener">https://github.com/NVIDIA/nvidia-container-runtime#installation</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime</a></p><p>先用这个命令<code>ping</code>一下<code>nvidia.github.io</code>看看返回的IP地址是否正常。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ping nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>我的网络环境这个域名已经被DNS污染了(DNS返回127.0.0.1),我的解决方案是在<code>/etc/hosts</code>文件中添加一行如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">185.199.110.153 nvidia.github.io</span><br></pre></td></tr></table></figure></p><p>然后继续安装即可</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \</span><br><span class="line"> sudo apt-key add -</span><br><span class="line">distribution=$(. /etc/os-release;<span class="built_in">echo</span> <span class="variable">$ID</span><span class="variable">$VERSION_ID</span>)</span><br><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/<span class="variable">$distribution</span>/nvidia-container-runtime.list | \</span><br><span class="line"> sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-container-runtime</span><br><span class="line">sudo nano /etc/docker/daemon.json</span><br></pre></td></tr></table></figure><p>填入下面的配置文件:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>如果有按照上一部分修改daemon,配置文件应该如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> <span class="string">"default-address-pools"</span>: [{<span class="string">"base"</span>:<span class="string">"10.10.0.0/16"</span>,<span class="string">"size"</span>:24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>最后重启docker。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h3 id="设置gpu常驻内存">设置GPU常驻内存</h3><p><a href="https://linuxeye.com/463.html" target="_blank" rel="noopener">https://linuxeye.com/463.html</a></p><p>设置GPU常驻内存,可将<code>nvidia-smi -pm 1</code>加入<code>/etc/rc.local</code>,重启生效,可解决GPU初始化缓慢、无任务运行但是利用率居高不下、偶尔丢卡等问题。</p><p>Ubuntu 18.04 LTS的rc.local逻辑已经改变了,需要自己调整一下。</p><ol type="1"><li><code>/lib/systemd/system/rc-local.service</code>文件新增以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[Install]</span><br><span class="line">WantedBy=multi-user.target</span><br><span class="line">Alias=rc-local.service</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>设置rc-local开机自启:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl <span class="built_in">enable</span> rc-local</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>在<code>/etc/rc.local</code>中填入以下内容:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/bin/sh -e</span></span><br><span class="line"></span><br><span class="line">nvidia-smi -pm 1</span><br><span class="line"></span><br><span class="line"><span class="built_in">exit</span> 0</span><br></pre></td></tr></table></figure><ol start="4" type="1"><li>赋予可执行权限:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo chmod +x /etc/rc.local</span><br></pre></td></tr></table></figure><h3 id="测试驱动是否正常">测试驱动是否正常</h3><ol type="1"><li>重启一下服务器。</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo reboot</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>检查显卡驱动以及常驻内存是否生效</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvidia-smi</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>能正确返回信息即驱动安装成功,<code>Persistence-M</code>的状态为<code>On</code>,则常驻内存配置成功。</li></ol><p>检查<code>nvidia-container-runtime</code>是否安装成功。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run nvidia/cuda:10.0-base nvidia-smi</span><br></pre></td></tr></table></figure><h3 id="处理容器内nvidia-smi无法显示占用gpu进程问题">处理容器内nvidia-smi无法显示占用GPU进程问题</h3><p><a href="https://github.com/microsoft/pai/issues/2001#issuecomment-862346186" target="_blank" rel="noopener">nvidia-smi can not detect the PID of processes using GPU</a></p><h2 id="dev-box">dev-box</h2><h3 id="配置免密登录">配置免密登录</h3><ol type="1"><li>生成密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>向远程主机注册密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-copy-id username@remote_host</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>测试是否可免密登录</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh username@remote_host</span><br></pre></td></tr></table></figure><h1 id="开始安装">开始安装</h1><p>下面都在dev-box中操作</p><p><a href="https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html" target="_blank" rel="noopener">https://openpai.readthedocs.io/en/pai-1.6.y/manual/cluster-admin/installation-guide.html</a></p><h2 id="准备项目">准备项目</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/microsoft/pai.git</span><br><span class="line"><span class="built_in">cd</span> pai</span><br><span class="line">git checkout v1.8.0</span><br></pre></td></tr></table></figure><h2 id="修改部分文件">修改部分文件</h2><p><strong><em>注</em></strong>:下面所有命令中<code><pai-code-dir></code>需要替换为您的pai项目所在路径。</p><p>修改理由在这里看:<a href="https://github.com/microsoft/pai/issues/5445" target="_blank" rel="noopener">Problems of install v1.6.0</a></p><ol type="1"><li>注释掉<a href="https://github.com/microsoft/pai/blob/master/contrib/kubespray/quick-start-kubespray.sh#L60" target="_blank" rel="noopener"><code><pai-code-dir>/contrib/kubespray/quick-start-kubespray.sh#L60</code></a>:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ansible-playbook -i <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml || <span class="built_in">exit</span> $?</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>将<a href="https://github.com/microsoft/pai/blob/master/contrib/kubespray/script/environment.sh#L50" target="_blank" rel="noopener"><code><pai-code-dir>/contrib/kubespray/script/environment.sh#L50</code></a>:</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo python3 -m pip install ansible==2.9.7</span><br></pre></td></tr></table></figure><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo python3 -m pip install ansible==2.9.24</span><br></pre></td></tr></table></figure></p><h2 id="离线安装相关文件准备">离线安装相关文件准备</h2><p><strong><em>参考:<a href="https://github.com/microsoft/pai/issues/5592" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5592</a></em></strong></p><h3 id="修改安装脚本">修改安装脚本</h3><p>将 <a href="https://github.com/microsoft/pai/blob/master/contrib/kubespray/quick-start-kubespray.sh" target="_blank" rel="noopener"><code><pai-code-dir>/contrib/kubespray/quick-start-kubespray.sh</code></a> 由 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">...</span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Performing docker-cache config distribution..."</span></span><br><span class="line"><span class="comment">#ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml -e "@${CLUSTER_CONFIG}" || exit $?</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Starting kubernetes..."</span></span><br><span class="line">/bin/bash script/kubernetes-boot.sh || <span class="built_in">exit</span> $?</span><br></pre></td></tr></table></figure></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">...</span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Performing docker-cache config distribution..."</span></span><br><span class="line"><span class="comment">#ansible-playbook -i ${HOME}/pai-deploy/cluster-cfg/hosts.yml docker-cache-config-distribute.yml -e "@${CLUSTER_CONFIG}" || exit $?</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Performing offline deploy file distribution..."</span></span><br><span class="line">ansible-playbook -i <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg/hosts.yml offline-deploy-files-distribute.yml || <span class="built_in">exit</span> $?</span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"Starting kubernetes..."</span></span><br><span class="line">/bin/bash script/kubernetes-boot.sh || <span class="built_in">exit</span> $?</span><br></pre></td></tr></table></figure></p><p>即在运行<code>script/kubernetes-boot.sh</code>之前运行<code>offline-deploy-files-distribute.yml</code>。</p><h3 id="下载并保存离线文件">下载并保存离线文件</h3><p><a href="https://08kytw.bn.files.1drv.com/y4mnHqJp5eDwIMFZgFBpZMdF6RZs9RLPIRZUQKyxfMxCCj5lS-NUxgM7bNstYdH-pMI0J_VgdUgssFfrcGw0mol-bjpGc0ntnKlSXz2hS-Tp3Mh68XMa_H__Trd4wDjpFkp_3VX_De6PZfpDF15z-I5nkkpY47WEzuQO97IAIqSoFYfjzmi_Uqi1Ijwzhx4we3oG1gUpuc_MSBobyot09h76A" target="_blank" rel="noopener">下载地址</a></p><p>Windows直接使用浏览器下载,Linux可使用下述命令下载: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl https://08kytw.bn.files.1drv.com/y4mnHqJp5eDwIMFZgFBpZMdF6RZs9RLPIRZUQKyxfMxCCj5lS-NUxgM7bNstYdH-pMI0J_VgdUgssFfrcGw0mol-bjpGc0ntnKlSXz2hS-Tp3Mh68XMa_H__Trd4wDjpFkp_3VX_De6PZfpDF15z-I5nkkpY47WEzuQO97IAIqSoFYfjzmi_Uqi1Ijwzhx4we3oG1gUpuc_MSBobyot09h76A -o pai-offline-deploy-distribute.zip</span><br></pre></td></tr></table></figure></p><ol type="1"><li><p>解压 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">unzip pai-offline-deploy-distribute.zip</span><br><span class="line"><span class="built_in">cd</span> pai-offline-deploy-distribute</span><br></pre></td></tr></table></figure></p></li><li><p>将<code>offline-deploy-files-distribute.yml</code>复制到<code><pai-code-dir>/contrib/kubespray</code>。</p></li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cp offline-deploy-files-distribute.yml <pai-code-dir>/contrib/kubespray</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>将<code>roles/offline-deploy-files-distribute</code>复制到<code><pai-code-dir>/contrib/kubespray/roles</code>。</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cp -r roles/offline-deploy-files-distribute <pai-code-dir>/contrib/kubespray/roles</span><br></pre></td></tr></table></figure><p>注意,上述文件适用于x86_64架构,其他架构的相关文件链接可在 <a href="https://github.com/kubernetes-sigs/kubespray/blob/master/roles/download/defaults/main.yml" target="_blank" rel="noopener">kubespray/blob/master/roles/download/defaults/main.yml</a>找到。</p><p>脚本所作操作如下:</p><ol type="1"><li><p>加载需要的docker镜像;</p></li><li><p>由<a href="https://github.com/kubernetes-sigs/kubespray/blob/b0fcc1ad1d78a373a12c109491914b877fc2d56d/roles/download/defaults/main.yml#L2" target="_blank" rel="noopener">这一行</a>可知,安装的时候下载的文件会存放在<code>/tmp/releases/</code>文件夹,故可提前下载好相关文件以避免网络问题。</p></li><li><p>由<a href="https://github.com/kubernetes-sigs/kubespray/blob/daed3e5b6a085ac99e076b51d314fcf76e4127b4/roles/kubernetes/node/tasks/install.yml#L11" target="_blank" rel="noopener">这一行</a>可知,如果使用了<code>skip_downloads: true</code>参数,kubeadm默认不会在master节点安装,所以手动安装kubeadm。</p></li></ol><h2 id="编写参数文件">编写参数文件</h2><p>参考项目文档编写<code>layout.yaml</code>文件。</p><p>参考下面的格式编写<code>config.yaml</code>文件。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">user: <your-ssh-username></span><br><span class="line">password: <your-ssh-password></span><br><span class="line">docker_image_tag: v1.8.0</span><br><span class="line"></span><br><span class="line">openpai_kubespray_extra_var:</span><br><span class="line"> download_container: <span class="literal">false</span></span><br><span class="line"> skip_downloads: <span class="literal">true</span></span><br></pre></td></tr></table></figure><p>因为我们已经下载好了相关文件,所以使用<code>skip_downloads: true</code>禁用下载文件相关代码,使用<code>download_container: false</code>禁用拉取镜像相关代码。</p><h2 id="安装">安装</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> contrib/kubespray</span><br><span class="line">/bin/bash quick-start-kubespray.sh</span><br><span class="line">/bin/bash quick-start-service.sh</span><br></pre></td></tr></table></figure><h2 id="重启服务注意事项">重启服务注意事项</h2><p><a href="https://github.com/microsoft/pai/issues/5465" target="_blank" rel="noopener">https://github.com/microsoft/pai/issues/5465</a></p><p><a href="https://github.com/microsoft/pai/blob/master/docs/manual/cluster-admin/basic-management-operations.md" target="_blank" rel="noopener">项目文档</a>说管理集群要使用/pai/paictl.py,但是该文件默认checkout的是最终版本而非当前版本,所以会导致版本不一致问题。</p><p>如果要使用这个脚本的话,每次启动容器都要手动checkout一下,否则可能导致服务无法正常启动。</p><p>目前的解决方案是将本地checkout好的项目挂载到容器里面然后直接用那里面的paictl.py。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run -itd \</span><br><span class="line"> -e COLUMNS=<span class="variable">$COLUMNS</span> -e LINES=<span class="variable">$LINES</span> -e TERM=<span class="variable">$TERM</span> \</span><br><span class="line"> -v /var/run/docker.sock:/var/run/docker.sock \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/cluster-cfg:/cluster-configuration \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai-deploy/kube:/root/.kube \</span><br><span class="line"> -v <span class="variable">${HOME}</span>/pai:/mnt/pai \</span><br><span class="line"> --pid=host \</span><br><span class="line"> --privileged=<span class="literal">true</span> \</span><br><span class="line"> --net=host \</span><br><span class="line"> --name=dev-box \</span><br><span class="line"> --restart=always \</span><br><span class="line"> openpai/dev-box:v1.8.0</span><br></pre></td></tr></table></figure><p>运行这个命令前确保<code>${HOME}/pai</code>中检出版本与安装版本一致,然后容器中使用<code>/mnt/pai</code>中的<code>paictl.py</code>。</p><h1 id="后续">后续</h1><p>成功启动service之后,其他个性化配置过程一般不会有问题,除非配置文件写错了,务必检查清楚。</p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> PyCharm </tag>
<tag> microsoft </tag>
<tag> openpai </tag>
<tag> vs code </tag>
<tag> Deep Learning </tag>
<tag> cluster </tag>
<tag> Ubuntu 18.04 </tag>
</tags>
</entry>
<entry>
<title>配置Docker镜像</title>
<link href="2021/05/12/p56.html"/>
<url>2021/05/12/p56.html</url>
<content type="html"><![CDATA[<h1 id="简介">简介</h1><p>用于配置docker镜像的脚本,启用ssl保证安全性,兼容有无域名、兼容局域网或Internet网部署等情况。</p><h1 id="服务端配置">服务端配置</h1><h2 id="安装docker和docker-compose">安装docker和docker-compose</h2><p><a href="https://docs.docker.com/engine/install/ubuntu/" target="_blank" rel="noopener">https://docs.docker.com/engine/install/ubuntu/</a></p><p><a href="https://docs.docker.com/compose/install/" target="_blank" rel="noopener">https://docs.docker.com/compose/install/</a></p><h2 id="克隆项目">克隆项目</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/siaimes/docker-cache.git</span><br><span class="line"><span class="built_in">cd</span> docker-cache</span><br></pre></td></tr></table></figure><h2 id="生成证书">生成证书</h2><p>生成证书,其中第二个参数为1表示生成域名证书,其他值会生成IP证书,以自己条件确定:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> ssl</span><br><span class="line">chmod +x get_ssl.sh</span><br><span class="line">./get_ssl.sh your_server_ip_or_domain 0</span><br><span class="line"><span class="built_in">cd</span> ..</span><br></pre></td></tr></table></figure><p>如果端口有开放互联网访问,可以申请Let's Encrypt证书,或者配合Nginx部署更多的服务,这里就不展开了。</p><h2 id="启动服务">启动服务</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nano docker-compose</span><br></pre></td></tr></table></figure><p>其中,修改<code>your_server_ip_or_domain</code>为你的服务器IP或域名,修改<code>0.0.0.0:5000:5000</code>中第一个<code>5000</code>为你宿主机可用端口。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">chmod +x *.sh</span><br><span class="line">./start.sh</span><br></pre></td></tr></table></figure><p>如果你镜像的不是dockerhub,例如gcr.io,那么将 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">- PROXY_REMOTE_URL=https://registry-1.docker.io</span><br></pre></td></tr></table></figure></p><p>改为 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">- PROXY_REMOTE_URL=https://gcr.io</span><br></pre></td></tr></table></figure></p><h1 id="客户端配置">客户端配置</h1><h2 id="克隆项目-1">克隆项目</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/siaimes/docker-cache.git</span><br><span class="line"><span class="built_in">cd</span> docker-cache</span><br></pre></td></tr></table></figure><h2 id="获取证书">获取证书</h2><p>如果是可信的证书,例如Let's Encrypt签发的证书,那么无需这一步。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo ./get_docker_cache_ssl.sh your_server_ip_or_domain port username /path/to/ssl</span><br></pre></td></tr></table></figure><p>如果服务器限制密码登录,用脚本拷贝证书到客户端可能会遇到问题。</p><p>我们可以自己参考<code>get_docker_cache_ssl.sh</code>配置客户端:</p><p>这里如果port是443可以省略,后面也是如此。</p><ol type="1"><li>创建文件夹</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo mkdir -p /etc/docker/certs.d/your_server_ip_or_domain:port/</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>输出服务端的证书并拷贝到剪切板</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cat ./ssl/your_server_ip_or_domain.crt</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>在客户端创建证书文件并粘贴服务端证书内容</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/docker/certs.d/your_server_ip_or_domain:port/your_server_ip_or_domain.crt</span><br></pre></td></tr></table></figure><h2 id="测试服务">测试服务</h2><p>如果镜像的是dockerhub,可以用下述命令测试:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker pull your_server_ip_or_domain:port/library/ubuntu</span><br></pre></td></tr></table></figure><p>注意到拉取官方镜像的时候需要加上<code>library</code>,否则<code>Error response from daemon: manifest for ubuntu:latest not found</code>。</p><p>如果镜像的是gcr.io,可以用下述命令测试:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker pull your_server_ip_or_domain:port/google-containers/kube-apiserver:v1.15.11</span><br></pre></td></tr></table></figure><h2 id="固化配置">固化配置</h2><p>如果是镜像dockerhub才可以做这一步,如果镜像的是其它仓库,请忽略。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nano /etc/docker/daemon.json</span><br></pre></td></tr></table></figure><p>添加以下内容</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">{<span class="string">"registry-mirrors"</span>: [<span class="string">"https://your_server_ip_or_domain:port"</span>]}</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><p>测试结果</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo docker rmi your_server_ip_or_domain:port/library/ubuntu</span><br><span class="line">sudo docker pull ubuntu</span><br></pre></td></tr></table></figure><h1 id="参考连接">参考连接</h1><p><a href="https://openpai.readthedocs.io/en/latest/manual/cluster-admin/basic-management-operations.html" target="_blank" rel="noopener">How To Set Up HTTPS</a></p><p><a href="https://blog.csdn.net/min19900718/article/details/87920254" target="_blank" rel="noopener">x509: cannot validate certificate for 10.30.0.163 because it doesn't contain any IP SANs</a></p><p><a href="https://www.huaweicloud.com/articles/5fa5f84d8308590fcaa949d5dd5d9a04.html" target="_blank" rel="noopener">私有安全docker registry授权访问实验</a></p>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> docker </tag>
<tag> ubuntu </tag>
<tag> gcr.io </tag>
</tags>
</entry>
<entry>
<title>Docker-Compose导致ssh连接断开的问题</title>
<link href="2021/04/29/p55.html"/>
<url>2021/04/29/p55.html</url>
<content type="html"><![CDATA[<p>一般情况下,使用docker-compose不会出现网络问题,但是我遭遇的这个情况比较特殊,故记录之。</p><p>我遭遇的情况是:通过172.17.0.0网段的客户端访问具有Internet地址的ssh服务器,此情况使用docker或docker-compose都会导致ssh连接断开。这种情况常见于单位内部使用172.17.0.0网段的私有地址部署局域网,管理员通过ssh管理本单位的Internet服务器的时候。</p><h1 id="安装docker导致ssh断开连接">安装docker导致ssh断开连接</h1><p>其原因为docker安装完成时,立刻生成docker0网桥。由于ssh服务器是Internet地址,故docker认为默认的网段<code>172.17.0.0/16</code>不会与宿主机冲突,所以分配给docker0网桥的地址是<code>172.17.0.1</code>。然而,因为局域网主机全是<code>172.17.0.0/16</code>,故从该服务器到局域网的所有流量均会被发送到docker0,导致ssh断开连接。</p><p>这个问题比较容易定位,解决方案也很直观。打开<code>/lib/systemd/system/docker.service</code>,在<code>ExecStart=...</code>这一行末尾添加<code>--bip "172.18.0.1/16"</code>然后运行下面的命令重启docker即可: <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl daemon-reload </span><br><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure></p><h1 id="使用docker-compose启动容器时导致ssh断开连接">使用docker-compose启动容器时导致ssh断开连接</h1><p>其原因为使用docker-compose启动一组容器会共用一个全新的网桥,docker-compose也认为默认的网段<code>172.17.0.0/16</code>不会与宿主机冲突,且docker0居然不用,所以不用白不用,所以你懂的,于是ssh挂了!</p><p>解决方案:添加一条到局域网的路由,这等于报告docker和docker-compose<code>172.17.0.0/16</code>这个网段已经有用了,你两悠着点。</p><p>首先使用<code>route</code>命令查看默认网关,一般是与主机IP地址同网段的xxx.xxx.xxx.254这个地址。</p><p>然后使用如下命令添加一条路由。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo route add -net 172.17.0.0 netmask 255.255.0.0 gw <默认网关></span><br></pre></td></tr></table></figure><p>现在在使用docker-compose就不会导致ssh断开连接了。</p><h1 id="合并两个问题">合并两个问题</h1><p>其实只要我们添加了这一条路由,那么也就不用在<code>/lib/systemd/system/docker.service</code>,中添加<code>--bip "172.18.0.1/16"</code>了。</p><h1 id="固化配置">固化配置</h1><p>若上述测试没有问题了,那么将命令 <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">route add -net 172.17.0.0 netmask 255.255.0.0 gw <默认网关></span><br></pre></td></tr></table></figure></p><p>写入<code>/etc/rc.local</code>即可永远生效。</p><h1 id="未解决的地方">未解决的地方</h1><p>不知为何偶尔使用docker-compose还是会导致断网,此时检查路由表发现上述命令添加的路由表已经丢失。</p><p>所以,运行docker-compose之前最好确认路由表中记录是否在,不在的话,手动重新添加一下。</p><h1 id="参考连接">参考连接</h1><p><a href="https://stackoverflow.com/questions/41736187/docker-compose-network-creation-kicks-me-out-of-ssh" target="_blank" rel="noopener">docker-compose network creation kicks me out of ssh</a></p><h1 id="最新解决方案">最新解决方案</h1><p>之前的解决方案还是有问题。因为企业大了,网络复杂,整个局域网其实不止用到<code>172.17.*.*</code>这个网段。所以当服务起来之后,如果占用了<code>172.18.*.*</code>,而有客户端在这个网段的话,这个客户端将无法访问服务。</p><p>所以最终的方案是更换地址池到A类IP。</p><ol type="1"><li>修改<code>/etc/docker/daemon.json</code></li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> "default-address-pools": [{"base":"10.10.0.0/16","size":24}]</span><br><span class="line">}</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li><p>关闭所有docker服务</p></li><li><p>删除现有网桥 <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker network prune</span><br></pre></td></tr></table></figure></p></li><li><p>重启docker <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure></p></li><li><p>启动所有docker服务</p></li></ol>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> docker </tag>
<tag> ubuntu </tag>
<tag> docker-compose </tag>
</tags>
</entry>
<entry>
<title>解决使用容器部署NFS服务器无法限制客户端IP问题</title>
<link href="2021/01/01/p54.html"/>
<url>2021/01/01/p54.html</url>
<content type="html"><![CDATA[<p><a href="/2020/11/12/p53.html">部署Openpai</a>后,我使用了<a href="https://hub.docker.com/r/itsthenetwork/nfs-server-alpine" target="_blank" rel="noopener">nfs-server-alpine</a>镜像为深度学习环境搭建数据存储服务。不使用<a href="https://openpai.readthedocs.io/en/latest/manual/cluster-admin/how-to-set-up-storage.html" target="_blank" rel="noopener">Openpai的storage-manager</a>的原因是,当需要升级Openpai时,storage-manager也会一并关闭升级,这就导致所有运行中的服务由于IO错误而被迫中断。另一方面,考虑到数据存储服务与Openpai耦合性也没那么高,完全可以自己部署。</p><p>但是这样部署存在的问题是,<a href="https://hub.docker.com/r/itsthenetwork/nfs-server-alpine" target="_blank" rel="noopener">nfs-server-alpine</a>直接将共享目录暴露于<code>IP:/</code>下。那么,只要黑客使用nfs挂载命令扫描到你的IP,那么你的所有数据均暴露无疑。</p><p>刚开始的时候直接使用<a href="https://hub.docker.com/r/itsthenetwork/nfs-server-alpine" target="_blank" rel="noopener">nfs-server-alpine</a>介绍里面的方法限制客户端IP,但是发现,除了使用'*'以外,使用其他配置,客户端都无法挂载。</p><p><strong><em>思考:</em></strong>在保证nfs服务器没问题的情况下,客户端IP又在共享列表中,但是却无法挂载。那么可能的原因是,是否客户端到服务器的链路中存在NAT转换,导致实际到达服务器的IP已经不是客户端的IP了。</p><p><strong><em>启发:</em></strong>我的所有节点都在同一个网段中,本身不存在NAT。那另一个存在NAT的地方也只有docker了。</p><p><strong><em>问题:</em></strong>客户端到服务器的链路是正确的,但是服务器接收到客户端的请求后将请求提交给容器时使用NAT转换。</p><p><strong><em>解决方案:</em></strong>nfs容器使用主机网络(host network)可去除NAT转换。</p><p>最后docker-compose.yml配置文件如下:</p><figure class="highlight yml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">"2.1"</span></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line"> <span class="comment"># https://hub.docker.com/r/itsthenetwork/nfs-server-alpine</span></span><br><span class="line"> <span class="attr">nfs:</span></span><br><span class="line"> <span class="attr">image:</span> <span class="string">itsthenetwork/nfs-server-alpine:12</span></span><br><span class="line"> <span class="attr">container_name:</span> <span class="string">nfs</span></span><br><span class="line"> <span class="attr">restart:</span> <span class="string">unless-stopped</span></span><br><span class="line"> <span class="attr">privileged:</span> <span class="literal">true</span></span><br><span class="line"> <span class="attr">environment:</span></span><br><span class="line"> <span class="bullet">-</span> <span class="string">SHARED_DIRECTORY=/data</span></span><br><span class="line"> <span class="bullet">-</span> <span class="string">PERMITTED=192.168.1.0\/255.255.255.0</span></span><br><span class="line"> <span class="attr">volumes:</span></span><br><span class="line"> <span class="bullet">-</span> <span class="string">/local/data:/data</span></span><br><span class="line"> <span class="attr">network_mode:</span> <span class="string">host</span></span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> nfs </tag>
</tags>
</entry>
<entry>
<title>Openpai v1.3.0部署总结</title>
<link href="2020/11/12/p53.html"/>
<url>2020/11/12/p53.html</url>
<content type="html"><![CDATA[<p><strong><em>过时了,请使用<a href="/2021/06/07/p57.html">更新后的文档</a></em></strong></p><h1 id="前提条件">前提条件</h1><ol type="1"><li><p>集群内主机系统为Ubuntu 16.04 LST,且具有相同用户名和密码的管理员账户。</p></li><li><p>每一台机器的代理软件监听地址为<a href="http://127.0.0.1:8118" target="_blank" rel="noopener">http://127.0.0.1:8118</a>,自己部署,不在本文范围以内。</p></li></ol><h1 id="初始化">初始化</h1><h2 id="所有机器">所有机器</h2><h3 id="切换清华源">切换清华源</h3><p><a href="https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/" target="_blank" rel="noopener">https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/</a></p><h3 id="更新依赖">更新依赖</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h3 id="安装openssh-server">安装openssh-server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install openssh-server</span><br></pre></td></tr></table></figure><h3 id="安装docker">安装docker</h3><p><a href="https://docs.docker.com/engine/install/ubuntu/" target="_blank" rel="noopener">https://docs.docker.com/engine/install/ubuntu/</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">sudo apt remove docker docker-engine docker.io containerd runc</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install \</span><br><span class="line"> apt-transport-https \</span><br><span class="line"> ca-certificates \</span><br><span class="line"> curl \</span><br><span class="line"> gnupg-agent \</span><br><span class="line"> software-properties-common</span><br><span class="line">curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -</span><br><span class="line">sudo apt-key fingerprint 0EBFCD88</span><br><span class="line">sudo nano /etc/apt/sources.list.d/download_docker_com_linux_ubuntu.list</span><br></pre></td></tr></table></figure><p>这里不按照上面官网提供的方法做的原因是pai是这样操作的,否则会导致源重复,apt update会报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">deb https://download.docker.com/linux/ubuntu xenial stable</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo dpkg --remove-architecture i386</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install docker-ce docker-ce-cli containerd.io</span><br></pre></td></tr></table></figure><h3 id="安装python">安装python</h3><p>按理说Ubuntu默认是有安装python的,但是我在实际操作的时候发现有的机器就是没有python,所以手动确认以下,以避免报错。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install python</span><br></pre></td></tr></table></figure><h2 id="master">master</h2><h3 id="安装ntp">安装ntp</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install ntp</span><br></pre></td></tr></table></figure><h2 id="worker">worker</h2><h3 id="安装gpu驱动">安装GPU驱动</h3><p><a href="https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa" target="_blank" rel="noopener">https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-check-whether-the-gpu-driver-is-installed</a></p><p><a href="https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove" target="_blank" rel="noopener">https://howtoinstall.co/en/ubuntu/xenial/xserver-xorg?action=remove</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-430</span><br><span class="line">sudo apt autoremove xserver-xorg</span><br><span class="line">sudo apt autoremove --purge xserver-xorg</span><br><span class="line">sudo reboot</span><br></pre></td></tr></table></figure><h3 id="安装nvidia-container-runtime">安装nvidia-container-runtime</h3><p><a href="https://github.com/NVIDIA/nvidia-container-runtime#installation" target="_blank" rel="noopener">https://github.com/NVIDIA/nvidia-container-runtime#installation</a></p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-faqs-and-troubleshooting.html#how-to-install-nvidia-container-runtime</a></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \</span><br><span class="line"> sudo apt-key add -</span><br><span class="line">distribution=$(. /etc/os-release;<span class="built_in">echo</span> <span class="variable">$ID</span><span class="variable">$VERSION_ID</span>)</span><br><span class="line">curl -s -L https://nvidia.github.io/nvidia-container-runtime/<span class="variable">$distribution</span>/nvidia-container-runtime.list | \</span><br><span class="line"> sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list</span><br><span class="line">sudo apt update</span><br><span class="line">sudo apt install nvidia-container-runtime</span><br><span class="line">sudo nano /etc/docker/daemon.json</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">{</span><br><span class="line"> <span class="string">"default-runtime"</span>: <span class="string">"nvidia"</span>,</span><br><span class="line"> <span class="string">"runtimes"</span>: {</span><br><span class="line"> <span class="string">"nvidia"</span>: {</span><br><span class="line"> <span class="string">"path"</span>: <span class="string">"/usr/bin/nvidia-container-runtime"</span>,</span><br><span class="line"> <span class="string">"runtimeArgs"</span>: []</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl restart docker</span><br></pre></td></tr></table></figure><h3 id="测试驱动是否正常">测试驱动是否正常</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker run nvidia/cuda:10.0-base nvidia-smi</span><br></pre></td></tr></table></figure><h3 id="设置gpu常驻内存">设置GPU常驻内存</h3><p>运行此命令设置GPU常驻内存,可解决GPU启动缓慢、无任务运行但是利用率居高不下、偶尔丢卡等问题。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo nvidia-smi -pm 1</span><br></pre></td></tr></table></figure><p>上述命令系统重启之后会失效,要在系统层面生效可将<code>nvidia-smi -pm 1</code>加入<code>/etc/rc.local</code>,注意要放在<code>exit 0</code>之前。</p><h2 id="devbox">devbox</h2><h3 id="配置免密登录">配置免密登录</h3><ol type="1"><li>生成密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen</span><br></pre></td></tr></table></figure><ol start="2" type="1"><li>向远程主机注册密钥</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-copy-id username@remote_host</span><br></pre></td></tr></table></figure><ol start="3" type="1"><li>测试是否可免密登录</li></ol><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh username@remote_host</span><br></pre></td></tr></table></figure><h1 id="启动代理">启动代理</h1><p>集群中的机器由于需要拉取gcr.io的镜像,这在国内是无法访问的,故需要配置docker代理(openpai官网给的镜像已不可用)。devbox机器要下载pai的源码,如果访问github速度慢的话,可配置代理。</p><h2 id="启动docker代理">启动docker代理</h2><p><a href="https://docs.docker.com/config/daemon/systemd/#httphttps-proxy" target="_blank" rel="noopener">https://docs.docker.com/config/daemon/systemd/#httphttps-proxy</a></p><p>这是docker命令用的代理</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo mkdir -p /etc/systemd/system/docker.service.d</span><br><span class="line">sudo nano /etc/systemd/system/docker.service.d/http-proxy.conf</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[Service]</span><br><span class="line">Environment=<span class="string">"HTTP_PROXY=http://127.0.0.1:8118"</span></span><br><span class="line">Environment=<span class="string">"HTTPS_PROXY=http://127.0.0.1:8118"</span></span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo systemctl daemon-reload</span><br><span class="line">sudo systemctl restart docker</span><br><span class="line">sudo systemctl show --property=Environment docker</span><br></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo docker pull gcr.io/google-containers/kube-apiserver:v1.15.11</span><br></pre></td></tr></table></figure><h2 id="启动git代理">启动git代理</h2><p>启动命令</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git config --global http.proxy http://127.0.0.1:8118</span><br><span class="line">git config --global https.proxy http://127.0.0.1:8118</span><br></pre></td></tr></table></figure><p>关闭命令</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git config --global --<span class="built_in">unset</span> https.proxy</span><br><span class="line">git config --global --<span class="built_in">unset</span> http.proxy</span><br></pre></td></tr></table></figure><h1 id="开始安装">开始安装</h1><p>下面都在devbox中操作</p><p><a href="https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-guide.html#_4" target="_blank" rel="noopener">https://openpai.readthedocs.io/zh_CN/latest/manual/cluster-admin/installation-guide.html#_4</a></p><h2 id="编写参数文件">编写参数文件</h2><p>参考上面格式直接在家目录建立三个文件,分别为<code>master.csv</code>,<code>worker.csv</code>,<code>config</code>。</p><p>由于已经使用了代理,故不再使用官方文档里面的gcr.io镜像!所以我们只需要下面这个简单的配置文件即可。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">user: <your-ssh-username></span><br><span class="line">password: <your-ssh-password></span><br><span class="line">branch_name: pai-1.3.y</span><br><span class="line">docker_image_tag: v1.3.0</span><br><span class="line"></span><br><span class="line">kubeadm_download_url: <span class="string">"https://shaiictestblob01.blob.core.chinacloudapi.cn/share-all/kubeadm"</span></span><br><span class="line">hyperkube_download_url: <span class="string">"https://shaiictestblob01.blob.core.chinacloudapi.cn/share-all/hyperkube"</span></span><br></pre></td></tr></table></figure><h2 id="按步骤运行">按步骤运行</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/microsoft/pai.git</span><br><span class="line"><span class="built_in">cd</span> pai</span><br><span class="line">git checkout pai-1.3.y</span><br><span class="line"><span class="built_in">cd</span> contrib/kubespray</span><br><span class="line">/bin/bash quick-start-kubespray.sh -m ~/master.csv -w ~/worker.csv -c ~/config</span><br><span class="line">/bin/bash quick-start-service.sh -m ~/master.csv -w ~/worker.csv -c ~/config</span><br></pre></td></tr></table></figure><h1 id="后续">后续</h1><p>成功启动service之后,除了安装MarketPlace插件以外,其他个性化配置过程一般不会有问题,除非配置文件写错了。</p><h2 id="marketplace配置">MarketPlace配置</h2><p><a href="https://github.com/siaimes/openpaimarketplace" target="_blank" rel="noopener">https://github.com/siaimes/openpaimarketplace</a></p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> Ubuntu 16.04 </tag>
<tag> PyCharm </tag>
<tag> microsoft </tag>
<tag> openpai </tag>
<tag> vs code </tag>
<tag> Deep Learning </tag>
<tag> cluster </tag>
</tags>
</entry>
<entry>
<title>加速Snap</title>
<link href="2020/10/26/p52.html"/>
<url>2020/10/26/p52.html</url>
<content type="html"><![CDATA[<p>在国内使用snap安装软件包速度异常的慢,而且没有国内的镜像可用,这导致安装PyCharm等在snap上发布的软件包异常的慢,所以需要为snap配置代理。命令为: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo snap <span class="built_in">set</span> system proxy.http=<span class="string">"http://<proxy_addr>:<proxy_port>"</span></span><br><span class="line">sudo snap <span class="built_in">set</span> system proxy.https=<span class="string">"http://<proxy_addr>:<proxy_port>"</span></span><br></pre></td></tr></table></figure></p><p>上述命令有效的前提是有<code>http</code>代理服务正在监听<code><proxy_addr>:<proxy_port></code>,至于如何配置<code>http</code>代理不在本文讨论的范围之内。</p><p>然后就可以使用snap命令愉快的安装PyCharm了: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo snap install pycharm-community --classic</span><br></pre></td></tr></table></figure></p><p>参考:<a href="https://askubuntu.com/questions/764610/how-to-install-snap-packages-behind-web-proxy-on-ubuntu-16-04" target="_blank" rel="noopener">How to install snap packages behind web proxy on Ubuntu 16.04</a>。</p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> Ubuntu 16.04 </tag>
<tag> snap </tag>
<tag> PyCharm </tag>
</tags>
</entry>
<entry>
<title>一文解决深度学习环境问题</title>
<link href="2020/10/07/p51.html"/>
<url>2020/10/07/p51.html</url>
<content type="html"><![CDATA[<p>之前的文章<a href="/2020/08/31/p49.html">Ubuntu 16.04安装tensorflow运行环境</a>给出了一个使用conda管理深度学习环境的方案,实际操作起来比NVIDIA官网,TensorFlow官网等的解决方案都更加容易上手,且不易出错。但是在使用的时候会发现存在一个问题,导致该方案无法直接推广到需要NVCC编译器的项目。其原因在于,conda源中维护的cudatoolkit版本为NVIDIA官方提供的cuda版本的一个子集,即不包含编译器的版本。通过谷歌搜索与自己动手实践,现已解决该问题,故记录于此,以备不时之需。解决方案是,使用conda-forge维护的cudatoolkit-dev软件包,这个软件包是完整的cuda工具箱。PyTorch和TensorFlow都会默认依赖cudatoolkit,这就会导致cudatoolkit和cudatoolkit-dev同时存在。但是,目前我没发现这会导致冲突,故无需处理。</p><p>本文所有实验基于Ubuntu 16.04 Desktop,如需推广到其他系统,只需按对应系统安装好显卡驱动和Anaconda,后续虚环境配置过程理论上没有区别。</p><h1 id="关闭自动更新并主动更新系统依赖">关闭自动更新并主动更新系统依赖</h1><p>Ubuntu自动更新软件在后台运行的时候会导致终端中很多命令无法使用,故新系统建议关闭自动更新。设置方法为在<code>Software & Updates</code>中定位到<code>Updates</code>,将<code>Automatically check for updates</code>和<code>Notify me of a new Ubuntu version</code>设置为<code>Never</code>,如图所示。 <img src="/2020/08/30/p48/NeverUpdates.png" alt="关闭自动更新"></p><p>主动更新依赖命令</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure><h1 id="添加ppa源">添加ppa源</h1><p>Ubuntu 16.4官方源的驱动版本已经过时,不支持CUDA 10.0,好在NVIDIA为Ubuntu 16.04提供了ppa源,其中有支持CUDA 10.0的显卡驱动可用。命令如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br></pre></td></tr></table></figure></p><h1 id="安装显卡驱动">安装显卡驱动</h1><p>去NVIDIA官网下载的显卡驱动,可能对应系统没有对应版本的release而导致需要编译安装,但是编译往往会报错,是一个不可取的方案。本文使用ppa源安装显卡驱动,这样可以完全避免安装驱动导致的各种奇怪问题。</p><p>在<code>Software & Updates</code>中定位到<code>Additional Drivers</code>,选择最新版本的NVIDIA binary driver,然后<code>Apply Changes</code>。等待安装完成之后<code>reboot</code>即可,如图所示。 <img src="/2020/08/30/p48/AdditionalDrivers.png" alt="Additional Drivers"></p><p>这里有一个知识点需要注意,显卡驱动和cuda版本不必要一一对应,但是由于cuda是依赖于显卡驱动的,所以显卡驱动版本不能低于cuda要求的最低版本。安装完成显卡驱动之后,使用<code>nvidia-smi</code>命令输出的cuda版本为该驱动支持的最高cuda版本,但是更低版本的cuda也是可以支持的。基于以上分析,我们直接安装最新版本的驱动即可。</p><h1 id="安装anaconda">安装anaconda</h1><p>参考Anaconda官网<a href="https://docs.anaconda.com/anaconda/install/linux/" target="_blank" rel="noopener">安装指南</a>安装即可。</p><h1 id="切换清华源">切换清华源</h1><p>务必将conda源切换为清华源,否则由于网络问题导致的各种错误会让你怀疑人生。配置方法直接参考清华大学开源软件镜像站提供的<a href="https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/" target="_blank" rel="noopener">Anaconda 镜像使用帮助</a>。也可以顺便把pip源也切换了<a href="https://mirrors.tuna.tsinghua.edu.cn/help/pypi/" target="_blank" rel="noopener">pypi 镜像使用帮助</a>。</p><h1 id="配置虚环境">配置虚环境</h1><p>本文提供3个实例供参考,TensorFlow版本与CUDA之间的对应关系可在<a href="https://www.tensorflow.org/install/source#common_installation_problems" target="_blank" rel="noopener">这里</a>找到,PyTorch不同版本安装命令可在<a href="https://pytorch.org/get-started/locally/" target="_blank" rel="noopener">这里</a>或<a href="https://pytorch.org/get-started/previous-versions/" target="_blank" rel="noopener">这里</a>找到,安装命令已经体现了PyTorch版本与CUDA之间的对应关系。注意安装cudatoolkit-dev时需要去NVIDIA官网下载相关文件,故会在Executing transction停留很久,需要耐心等待。</p><p>安装其他版本的环境的时候需要自己确认是否有对应版本的release,命令如下。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">conda search tensorflow</span><br><span class="line">conda search -c pytorch pytorch</span><br><span class="line">conda search -c conda-forge cudatoolkit-dev</span><br></pre></td></tr></table></figure><h2 id="tensorflow-1.13.1">tensorflow 1.13.1</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">conda create --name tensorflow_1_13_1 python=3.6</span><br><span class="line">conda activate tensorflow_1_13_1</span><br><span class="line">conda install numpy=1.16.4</span><br><span class="line">conda install tensorflow-gpu=1.13.1</span><br><span class="line">conda install conda</span><br><span class="line">conda install -c conda-forge cudatoolkit-dev=10.0</span><br><span class="line">cat /home/usrname/anaconda3/envs/tensorflow_1_13_1/include/cudnn.h | grep CUDNN_MAJOR</span><br><span class="line">nvcc -V</span><br></pre></td></tr></table></figure><h2 id="tensorflow-2.1.0">tensorflow 2.1.0</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">conda create --name tensorflow_2_1_0 python=3.6</span><br><span class="line">conda activate tensorflow_2_1_0</span><br><span class="line">conda install tensorflow-gpu=2.1.0</span><br><span class="line">conda install conda</span><br><span class="line">conda install -c conda-forge cudatoolkit-dev=10.1</span><br><span class="line">cat /home/usrname/anaconda3/envs/tensorflow_2_1_0/include/cudnn.h | grep CUDNN_MAJOR</span><br><span class="line">nvcc -V</span><br></pre></td></tr></table></figure><h2 id="pytorch-1.6.0">pytorch 1.6.0</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">conda create --name pytorch_1_6_0 python=3.6</span><br><span class="line">conda activate pytorch_1_6_0</span><br><span class="line">conda install pytorch torchvision cudatoolkit=10.1 -c pytorch</span><br><span class="line">conda install conda</span><br><span class="line">conda install -c conda-forge cudatoolkit-dev=10.1</span><br><span class="line">nvcc -V</span><br></pre></td></tr></table></figure><h1 id="使用虚环境">使用虚环境</h1><p>如果要使用命令行运行代码,那么在运行代码之前使用<code>conda activate envname</code>命令激活对应的虚环境即可。</p><p>如果在PyCharm中运行代码,那么在<code>File>>Settings>>Project: project name>>Python Interpreter</code>指定对应的虚环境即可。</p><p>如需安装其他包,可在激活环境之后使用<code>conda install packagename</code>来安装,也可以使用PyCharm环境管理器安装。若需要安装pip包,也可在激活环境之后直接使用<code>pip install packagename</code>安装,也会安装到对应虚环境内部。</p>]]></content>
<categories>
<category> 环境 </category>
</categories>
<tags>
<tag> Linux </tag>
<tag> Ubuntu 16.04 </tag>
<tag> CUDA </tag>
<tag> cuDNN </tag>
<tag> Anaconda </tag>
<tag> tensorflow </tag>
<tag> NVIDIA </tag>
<tag> driver </tag>
<tag> PyTorch </tag>
</tags>
</entry>
<entry>
<title>跨终端共享键鼠解决方案</title>
<link href="2020/09/16/p50.html"/>
<url>2020/09/16/p50.html</url>
<content type="html"><![CDATA[<h1 id="软件安装">软件安装</h1><h2 id="windows">Windows</h2><ol type="1"><li><p>直接去<a href="https://github.com/debauchee/barrier/releases" target="_blank" rel="noopener">barrier releases page</a>下载BarrierSetup-x.x.x-release.exe并安装。</p></li><li><p>按照官方<a href="https://github.com/debauchee/barrier/wiki/Adding-Barrier-to-the-Windows-Firewall" target="_blank" rel="noopener">Wifi</a>指引配置好防火墙。</p></li></ol><h2 id="ubuntu-16.04">Ubuntu 16.04</h2><p>可直接在<code>Ubuntu Software</code>中搜索barrier进行安装,或运行如下命令也可安装。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo snap install barrier</span><br></pre></td></tr></table></figure><h1 id="软件配置">软件配置</h1><p>启动的时候注意选择运行模式,共享键鼠的设备作为服务端,而接收其他设备共享的键鼠的设备作为客户端。</p><h2 id="服务端">服务端</h2><p>勾选Server,点击<code>设置服务端···</code>按钮,按指引添加一个屏幕,注意名称要和客户端显示的屏幕名一致,为了便于记忆,可以设置一个别名。配置好之后即可退出。待客户端完成配置之后点击开始按钮即可。</p><h2 id="客户端">客户端</h2><p>勾选Client,去掉Auto config选项,服务端IP输入Barrier服务端显示的IP,点击开始。</p>]]></content>
<categories>
<category> 系统 </category>
</categories>
<tags>
<tag> 键鼠共享 </tag>
<tag> Barrier </tag>
</tags>
</entry>
<entry>
<title>Ubuntu 16.04安装tensorflow运行环境</title>
<link href="2020/08/31/p49.html"/>
<url>2020/08/31/p49.html</url>
<content type="html"><![CDATA[<p><a href="/2020/08/30/p48.html">上一篇博文</a>本来是为这篇博文做铺垫的,但是真正在安装tensorflow的时候,却发现因为有Anaconda和PyCharm的存在,其实不需要手动安装CUDA和cuDNN,且Python包的管理也简化了,故在此记录一下。</p><h1 id="关闭自动更新并主动更新系统依赖">关闭自动更新并主动更新系统依赖</h1><p>Ubuntu自动更新软件在后台运行的时候会导致终端中很多命令无法使用,故新系统建议关闭自动更新。设置方法为在<code>Software & Updates</code>中定位到Updates,将<code>Automatically check for updates</code>和<code>Notify me of a new Ubuntu version</code>设置为<code>Never</code>,如图所示。 <img src="/2020/08/30/p48/NeverUpdates.png" alt="关闭自动更新"></p><p>主动更新依赖命令 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure></p><h1 id="添加ppa源">添加ppa源</h1><p>Ubuntu 16.4官方源的驱动版本已经过时,不支持CUDA 10.0,好在NVIDIA为Ubuntu 16.04提供了ppa源,其中有支持CUDA 10.0的显卡驱动可用。命令如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br></pre></td></tr></table></figure></p><h1 id="安装显卡驱动">安装显卡驱动</h1><p>在<code>Software & Updates</code>中定位到<code>Additional Drivers</code>,选择最新版本的NVIDIA binary driver(至少不能低于CUDA文件包含的版本号,后文会有),然后<code>Apply Changes</code>。等待安装完成之后<code>reboot</code>即可,如图所示。 <img src="/2020/08/30/p48/AdditionalDrivers.png" alt="Additional Drivers"></p><h1 id="安装anaconda">安装Anaconda</h1><p>参考Anaconda官网<a href="https://docs.anaconda.com/anaconda/install/linux/" target="_blank" rel="noopener">安装指南</a>安装即可。</p><h1 id="安装pycharm">安装PyCharm</h1><p>在<code>Ubuntu Software</code>中搜索<code>PyCharm</code>并安装<code>PyCharm CE</code>即可。</p><h1 id="使用pycharm管理python环境">使用PyCharm管理Python环境</h1><p>启动PyCharm,在创建项目的时候需要选择运行环境,PyCharm在这里提供了调用Conda的接口。在<code>New environment using Conda</code>那里选择目标Python版本,然后Create即可创建一个项目和Python环境,Python版本的选择与你自己目标tensorflow版本之间的关系参考<a href="https://www.tensorflow.org/install/source#common_installation_problems" target="_blank" rel="noopener">这里</a>。等待创建完成进入PyCharm之后,进入File>>Settings>>Project: project name>>Python Interpreter即可管理Python包。点击<code>+</code>并搜索<code>tensorflow-gpu</code>并安装即可,注意右下角可以选择特定的版本。通过此方法安装tensorflow不需要手动安装CUDA和cuDNN,Conda会自动安装相关依赖。</p><h1 id="测试">测试</h1><p>在刚才的项目中创建一个Python文件,添加来自tensorflow官网的<a href="https://www.tensorflow.org/tutorials/quickstart/beginner" target="_blank" rel="noopener">测试代码</a>并运行,查看输出是否有报错,是否有使用到CUDA等。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># https://www.tensorflow.org/tutorials/quickstart/beginner</span></span><br><span class="line"><span class="keyword">import</span> tensorflow <span class="keyword">as</span> tf</span><br><span class="line">mnist = tf.keras.datasets.mnist</span><br><span class="line"></span><br><span class="line">(x_train, y_train), (x_test, y_test) = mnist.load_data()</span><br><span class="line"></span><br><span class="line">x_train, x_test = x_train / <span class="number">255.0</span>, x_test / <span class="number">255.0</span></span><br><span class="line"></span><br><span class="line">model = tf.keras.models.Sequential([</span><br><span class="line"> tf.keras.layers.Flatten(input_shape=(<span class="number">28</span>,<span class="number">28</span>)),</span><br><span class="line"> tf.keras.layers.Dense(<span class="number">128</span>, activation=<span class="string">'relu'</span>),</span><br><span class="line"> tf.keras.layers.Dropout(<span class="number">0.2</span>),</span><br><span class="line"> tf.keras.layers.Dense(<span class="number">10</span>, activation=<span class="string">'softmax'</span>)</span><br><span class="line">])</span><br><span class="line"></span><br><span class="line">model.compile(optimizer=<span class="string">'adam'</span>,</span><br><span class="line"> loss=<span class="string">'sparse_categorical_crossentropy'</span>,</span><br><span class="line"> metrics=[<span class="string">'accuracy'</span>])</span><br><span class="line"></span><br><span class="line">model.fit(x_train, y_train, epochs=<span class="number">5</span>)</span><br><span class="line"></span><br><span class="line">model.evaluate(x_test, y_test, verbose=<span class="number">2</span>)</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> Ubuntu 16.04 </tag>
<tag> Anaconda </tag>
<tag> tensorflow </tag>
</tags>
</entry>
<entry>
<title>Ubuntu 16.04安装CUDA 10.0和cuDNN 7.6.5</title>
<link href="2020/08/30/p48.html"/>
<url>2020/08/30/p48.html</url>
<content type="html"><![CDATA[<h1 id="关闭自动更新并主动更新系统依赖">关闭自动更新并主动更新系统依赖</h1><p>Ubuntu自动更新软件在后台运行的时候会导致终端中很多命令无法使用,故新系统建议关闭自动更新。设置方法为在<code>Software & Updates</code>中定位到Updates,将<code>Automatically check for updates</code>和<code>Notify me of a new Ubuntu version</code>设置为<code>Never</code>,如图所示。 <img src="p48/NeverUpdates.png" alt="关闭自动更新"></p><p>主动更新依赖命令 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt update</span><br><span class="line">sudo apt upgrade</span><br></pre></td></tr></table></figure></p><h1 id="添加ppa源">添加ppa源</h1><p>Ubuntu 16.4官方源的驱动版本已经过时,不支持CUDA 10.0,好在NVIDIA为Ubuntu 16.04提供了ppa源,其中有支持CUDA 10.0的显卡驱动可用。命令如下: <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo add-apt-repository ppa:graphics-drivers/ppa</span><br><span class="line">sudo apt update</span><br></pre></td></tr></table></figure></p><h1 id="安装显卡驱动">安装显卡驱动</h1><p>在<code>Software & Updates</code>中定位到<code>Additional Drivers</code>,选择最新版本的NVIDIA binary driver(至少不能低于CUDA文件包含的版本号,后文会有),然后<code>Apply Changes</code>。等待安装完成之后<code>reboot</code>即可,如图所示。 <img src="p48/AdditionalDrivers.png" alt="Additional Drivers"></p><h1 id="安装cuda">安装CUDA</h1><h2 id="下载安装文件">下载安装文件</h2><p>打开CUDA 10.0下载页面:<a href="https://developer.nvidia.com/cuda-10.0-download-archive" target="_blank" rel="noopener">https://developer.nvidia.com/cuda-10.0-download-archive</a>。按下图所示选择对应系统,下载给出的两个<code>run</code>文件。 <img src="p48/CUDA_Toolkit_10.0.jpg" alt="CUDA Toolkit 10.0"></p><h2 id="安装">安装</h2><p>运行如下命令即可安装CUDA。<strong><em>注意不要安装显卡驱动,即当系统询问<code>Install NVIDIA Accelerated Graphics Driver for Linux?</code>的时候输入<code>no</code>。</em></strong>这里选择仅仅安装CUDA,只要系统中的驱动版本不低于CUDA要求的最低版本(CUDA文件名已包含最低要求版本),CUDA即可调用系统中的显卡驱动。这样做可避免不必要的运行报错。虽然可以通过init 3之后安装,但是重启之后却无法登入系统。这可能是CUDA自带的驱动没有针对相应系统做优化导致的,而ppa源中的驱动都是针对相应系统优化了的。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span></span><br><span class="line"><span class="built_in">cd</span> Downloads</span><br><span class="line">chmod +x *.run</span><br><span class="line">sudo ./cuda_10.0.130_410.48_linux.run</span><br><span class="line">sudo ./cuda_10.0.130.1_linux.run</span><br></pre></td></tr></table></figure></p><p>运行上述命令,提示输入的时候输入如下图所示。 <img src="p48/NotSelectedDriver.png" alt="Not Selected Driver"></p><p>安装完成之后还要修改<code>PATH</code>,使得CUDA可以被调用。在<code>~/.bashrc</code>文件末尾加入如下两行代码并运行<code>source ~/.bashrc</code>即可。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">export</span> PATH=/usr/<span class="built_in">local</span>/cuda-8.0/bin:<span class="variable">$PATH</span></span><br><span class="line"><span class="built_in">export</span> LD_LIBRARY_PATH=/usr/<span class="built_in">local</span>/cuda-8.0/lib64</span><br></pre></td></tr></table></figure></p><h2 id="验证">验证</h2><p>运行如下命令,最后一行输出是<code>Result = PASS</code>即表示CUDA安装成功。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> /usr/<span class="built_in">local</span>/cuda-10.0/samples/1_Utilities/deviceQuery</span><br><span class="line">sudo make</span><br><span class="line">./deviceQuery</span><br></pre></td></tr></table></figure></p><h1 id="安装cudnn">安装cuDNN</h1><h2 id="下载安装文件-1">下载安装文件</h2><p>直接去官网下载对应系统的对应deb包安装即可: <a href="https://developer.nvidia.com/rdp/cudnn-archive" target="_blank" rel="noopener">cudnn-archive</a>。 <img src="p48/cudnn-archive.jpg" alt="cudnn-archive"></p><h2 id="安装-1">安装</h2><p>进入Downloads文件夹,直接双击下载好的三个<code>deb</code>文件即可安装。</p><h2 id="验证-1">验证</h2><p>运行如下命令,最后一行输出是<code>Test passed!</code>即表示cuDNN安装成功。 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">cp -r /usr/src/cudnn_samples_v7/ <span class="variable">$HOME</span></span><br><span class="line"><span class="built_in">cd</span> <span class="variable">$HOME</span>/cudnn_samples_v7/mnistCUDNN</span><br><span class="line">make clean && make</span><br><span class="line">./mnistCUDNN</span><br></pre></td></tr></table></figure></p><h1 id="参考">参考</h1><ul><li><p><a href="https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa" target="_blank" rel="noopener">Proprietary GPU Drivers</a></p></li><li><p><a href="https://superuser.com/questions/1183200/cant-login-to-ubuntu-after-installing-cuda" target="_blank" rel="noopener">Can't login to Ubuntu after installing CUDA</a></p></li><li><p><a href="https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04" target="_blank" rel="noopener">How can I install CUDA on Ubuntu 16.04?</a></p></li><li><p><a href="https://medium.com/@kapilvarshney/how-to-setup-ubuntu-16-04-with-cuda-gpu-and-other-requirements-for-deep-learning-f547db75f227" target="_blank" rel="noopener">How to Setup Ubuntu 16.04 with CUDA, GPU, and other requirements for Deep Learning</a></p></li><li><p><a href="https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#verify" target="_blank" rel="noopener">Verifying The cuDNN Install On Linux</a></p></li></ul>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> Ubuntu 16.04 </tag>
<tag> CUDA </tag>
<tag> cuDNN </tag>
<tag> Desktop </tag>
</tags>
</entry>
<entry>
<title>上传公钥到服务器</title>
<link href="2020/08/30/p47.html"/>
<url>2020/08/30/p47.html</url>
<content type="html"><![CDATA[<p>为了安全和方便,远程Linux系统可使用SSH Key访问。一般本地生成一组SSH Key备用,且无特殊情况不更换,换电脑也将系统文件夹<code>C:\Users\Username\.ssh</code>拷贝到新电脑即可。那么本地生成的SSH Key如何发送到服务器呢?很简单,登录服务器,将<code>.ssh</code>文件夹中<code>id_rsa.pub</code>文件内容拷贝到目标用户的<code>~/.ssh/authorized_keys</code>文件中且单独一行,保存并退出。使用<code>sudo systemctl restart sshd</code>重启SSH服务使得新添加的公钥生效,现在即可使用<code>username+id_rsa</code>组合登录服务器了。</p>]]></content>
<categories>
<category> Linux </category>
</categories>
<tags>
<tag> SSH Key </tag>
<tag> Debian </tag>
<tag> Linux </tag>
</tags>
</entry>
<entry>
<title>台式机配置集成显卡输出,独立显卡计算</title>
<link href="2020/08/29/p46.html"/>
<url>2020/08/29/p46.html</url>
<content type="html"><![CDATA[<p>安装了独立显卡的台式机一般默认使用独立显卡输出而禁用集成显卡。但是用于做计算的独立显卡大多没有VGA接口,这样就需要使用HDMI转VGA转接线。一方面,要是机器多了,转接头也是一笔开销。另一方面,用于计算的GPU同时用于显示输出或多或少都会浪费一些资源。按我自己的理解是,集成显卡和独立显卡是可以同时工作的,否则有多显卡服务器就不存在了。通过查资料和自己摸索,最终找到解决方案,现以联想台式机为例,记录如下。</p><p>首先将显示器与独立显卡链接,以点亮显示器。进入BIOS设置界面:开机狂按F12>>Enter Setup>>Devices>>Video Setup:</p><blockquote><p>Select Active Video [IGD]</p></blockquote><blockquote><p>Pre-Allocated Memory Size [1024MB]</p></blockquote><blockquote><p>Total Graphics Memory [Maximum]</p></blockquote><blockquote><p>Multi-Monitor Support [Enabled]</p></blockquote><p>保存并退出:Esc>>Exit>>Save Changes and Exit。现在将显示器与集成显卡链接,可以看到显示器已点亮。进入系统后,打开终端输入命令<code>nvidia-smi</code>可以看到显卡仍然在工作。至此,独立显卡与集成显卡同时工作配置已完成。</p>]]></content>
<categories>
<category> 电脑 </category>
</categories>
<tags>
<tag> GPU </tag>
<tag> 独立显卡 </tag>
<tag> 双显卡 </tag>
</tags>
</entry>
<entry>
<title>Windows 系统蓝牙适配问题</title>
<link href="2019/07/15/p45.html"/>
<url>2019/07/15/p45.html</url>
<content type="html"><![CDATA[<p>最近窗户外面修楼,打地基太吵(嗡嗡嗡咯,嗡嗡嗡咯,嗡嗡嗡咯……),于是下血本买了台降噪耳机用。耳机的降噪效果还是有的,发动机的嗡嗡声基本上屏蔽掉了,但是喇叭声、刹车声之类的高频噪音搞不定。不过还算满意吧,已经屏蔽掉大部分声音了,其他的只能期待技术进步啦。</p><p>耳机虽然好用,但是也遇到了难以解决的问题,这里贴一下折腾过程吧。</p><p>问题是这样的,蓝牙耳机与手机连接工作一切正常,但是与PC连接会出现很多奇怪的现象,换了好几个方案都没有解决,最后还是采用了一个折中的方案。</p><p>首先是 CSR 芯片的方案,这个方案不能用,原因是耳机每次开机都要重新配对才可以使用。现象是耳机提示已连接PC,但是PC上显示已断开连接,耳机里面能够听到PC上播放的很微弱的音乐声。</p><p>随后换了博通芯片的方案,这个方案解决了必须重新配对的问题,但是也不能用,原因是每次必须在播放设备里面手动连接耳机。有两个现象,一个是蓝牙耳机语音提示已连接PC,也可以在PC的蓝牙设备列表里面看到耳机已连接,但是在播放设备列表里面显示已断开连接,必须在这里手动连接后耳机才会有声音;另一个是即使耳机连接上了,麦克风也连接不上,也就是说蓝牙耳机的麦克风无法使用。</p><p>想来想去,这个问题应该是 Windows 对蓝牙的支持不好导致的。那既然这样,不是 Windows 对蓝牙支持不够好嘛,那咱搞个免驱的试试呗,所以搞了个免驱的回来。这次成功的解决了我的大部分问题。免驱的方案其实也不是真正的免驱,应该是把声卡和蓝牙做到了一起,即自带声卡的蓝牙适配器。使用的时候以声卡的身份与 Windows 交互,以蓝牙适配器的身份与耳机交互,从而避免了在 Windows 中处理蓝牙问题。这个适配器与耳机的配对都是在适配器上完成的,完全不需要Windows参与,Windows 识别成声卡,插上就可以用,上述提到的问题都解决了。包括重启重新配对,重启重新连接,麦克风不能使用等等问题都不复存在了。</p><p>不过还有一个遗留的问题解决不了:如果在使用耳机播放音乐的同时使用麦克风,那么耳机音质会变得很差。找了很多帖子都没有办法解决,不过这个问题查到最后可以确定是蓝牙协议的问题,而不是 Windows 的问题,所以无法解决。我买的耳机只支持 A2DP 协议,而不支持 aptX 协议, A2DP 协议无法同时使用立体声和麦克风,这时候我才回去手机上测试,发现手机上其实也是存在这个问题的。但是没有在手机上发现这个问题是因为手机在用麦克风的时候会直接将音乐关掉,真是个小机灵鬼呀。最后的折中方案是外挂一个麦克风,将默认播放设备设置为蓝牙耳机,默认录制设备设置为外挂的麦克风,并且将蓝牙耳机的麦克风禁用掉。</p>]]></content>
<categories>
<category> 系统 </category>
</categories>
<tags>
<tag> 蓝牙 </tag>
<tag> Windows </tag>
<tag> win </tag>
<tag> 驱动 </tag>
<tag> 免驱 </tag>
<tag> 蓝牙问题 </tag>
</tags>
</entry>
<entry>
<title>MySQL数据库快速迁移</title>
<link href="2019/06/10/p44.html"/>
<url>2019/06/10/p44.html</url>
<content type="html"><![CDATA[<h1 id="为啥要迁移数据库">为啥要迁移数据库</h1><p>数据库中保留了大量用户信息,虽然运行环境可以重新搭建,但是数据库是无法手动恢复的,所以当服务器迁移的时候,数据库的安全迁移便很重要了。</p><h1 id="目标">目标</h1><p>将源服务器的MySQL数据库迁移到目标服务器,这里源服务器可以是需要被替换掉的服务器,也可以是测试环境等。</p><h1 id="前提">前提</h1><p>源服务器要有一个管理员权限的有密码的用户且允许该用户远程登陆。</p><h1 id="源服务器设置">源服务器设置</h1><p>GCP之类的vps服务器默认是不允许远程使用密码登录的,那么需要为源服务器设置root用户的密码以及允许远程登录。</p><h2 id="切换到root用户使用ssh登录到">切换到root用户使用ssh登录到</h2><p>使用ssh登录到服务器,然后允许命令:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo -i</span><br></pre></td></tr></table></figure><h2 id="修改ssh配置文件">修改SSH配置文件</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/ssh/sshd_config</span><br></pre></td></tr></table></figure><ol type="1"><li>使用/PermitRootLogin查找到PermitRootLogin,可以看到该行被注释掉了,取消改行的注释并将值改为yes;</li><li>使用/PasswordAuthentication查找到PasswordAuthentication,将其值改为yes。</li></ol><h2 id="为root用户设置密码">为root用户设置密码</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">passwd root</span><br></pre></td></tr></table></figure><h2 id="重启ssh">重启SSH</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/etc/init.d/ssh restart</span><br></pre></td></tr></table></figure><h1 id="迁移">迁移</h1><p>迁移工作在目标服务器进行,直接使用rsync即可,命令如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">rsync -avz [email protected]:/var/lib/mysql/* /var/lib/mysql/</span><br></pre></td></tr></table></figure><p>其中old.com可以是源服务器的域名或者IP地址。文件迁移完成之后需要检查两台服务器上文件是否相同。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ls -l /var/lib/mysql/</span><br></pre></td></tr></table></figure><h1 id="赋予mysql用户权限">赋予MySQL用户权限</h1><p>如果目标服务器尚未安装MySQL,那么先安装MySQL:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">apt install mariadb-server</span><br></pre></td></tr></table></figure><p>如果上面的命令报错,那么需要先更新一下源才可以:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">apt update</span><br></pre></td></tr></table></figure><p>安装完MySQL之后可以将/var/lib/mysql权限赋予MySQL: <figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">chown mysql:mysql -R /var/lib/mysql/</span><br></pre></td></tr></table></figure></p><h1 id="再次确认数据库正确迁移">再次确认数据库正确迁移</h1><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">mysql -u root -p</span><br><span class="line">enter password</span><br><span class="line">show databases;</span><br><span class="line">SELECT User FROM mysql.user;</span><br><span class="line">use databases;</span><br><span class="line">SELECT * FROM table;</span><br><span class="line">exit</span><br></pre></td></tr></table></figure>]]></content>
<categories>
<category> 数据库 </category>
</categories>
<tags>
<tag> MySQL </tag>
<tag> 数据库 </tag>
<tag> 迁移 </tag>
<tag> rsync </tag>
</tags>
</entry>
<entry>
<title>福州大学学位论文全自动化排版Word模板</title>
<link href="2019/05/08/p42.html"/>
<url>2019/05/08/p42.html</url>
<content type="html"><![CDATA[<h1 id="项目地址">项目地址</h1><p>项目主站:<a href="https://siaimes.github.io/ThesisFZU/">ThesisFZU web page</a></p><p>项目仓库:<a href="https://github.com/siaimes/ThesisFZU" target="_blank" rel="noopener">ThesisFZU github</a></p><h1 id="下载地址">下载地址</h1><p>项目主站下载<a href="https://github.com/siaimes/ThesisFZU/zipball/master" target="_blank" rel="noopener">download ThesisFZU from web page</a></p><p>项目仓库下载<a href="https://github.com/siaimes/ThesisFZU/archive/master.zip" target="_blank" rel="noopener">download ThesisFZU from github</a></p><h1 id="模板特色">模板特色</h1><blockquote><ol type="1"><li>章节编号使用隐藏的阿拉伯数字,实际显示的编号需要手动输入,所以本模板可以在处理所有章节编号格式的同时,不影响交叉引用。不同的专业使用只需对照自己的规范对章编号做细微调整即可;</li><li>本模板中各个模块按照本科生和研究生规范的并集设计制作,注意核对自己的论文规范将不需要的模块删除;</li><li>全文使用样式表排版,做到最大程度的自动化,模板中的样式严格按照《福州大学本科生毕业设计(论文)撰写规范(2017年修订)》和《福州大学研究生毕业论文规范(2016年修订)》设计制作;</li><li>公式建议使用<a href="http://www.amyxun.com/" target="_blank" rel="noopener">AxMath</a><sup>*</sup>插件,该插件支持所见即所得、LaTeX编辑器和交叉引用等,相当好用,最关键的是排版效果相当漂亮;</li><li>优化word自带参考文献排版引擎,并编写引文设置为上标的VBA代码,做到参考文献全自动管理;</li></ol></blockquote><h1 id="使用技巧">使用技巧</h1><blockquote><ol type="1"><li>本模板使用office 365设计制作,要使用该模板,office版本需要在2016及以上,否则不能保证一切正常;</li><li>直接在模板上双击即可新建一个word文档,若提示宏已被禁用,需要点击“启用内容”,否则无法自动管理参考文献;</li><li>若word文档在不同电脑之间迁移或者你移动了模板位置,这会由于无法索引到模板而导致无法自动管理参考文献,现象为找不到宏“调整参考文献格式”,此时打开:Word>>文件>>选项>>加载项>>管理[Word加载项]>>转到,重新选用文档模板即可;</li><li>不建议将该模板导入其他word文件,无法保证一切正常,如果论文快完成才知道这个模板可以考虑将论文使用该模板重排。</li></ol></blockquote><p><sup>*</sup>注:由于该插件需要写入注册表,故许多安全软件会报告病毒,此时请按照官网提示操作即可。</p>]]></content>
<categories>
<category> 模板 </category>
</categories>
<tags>
<tag> 论文 </tag>
<tag> 学术 </tag>
<tag> ThesisFZU </tag>
</tags>
</entry>
<entry>
<title>优酷:由于您禁用了COOKIE导致视频无法播放</title>
<link href="2018/06/12/p40.html"/>
<url>2018/06/12/p40.html</url>
<content type="html"><![CDATA[<p>17年年底刷<a href="https://www.youku.com/" target="_blank" rel="noopener">优酷</a>时候,点开每个视频都无法播放。提示:由于您禁用了COOKIE导致视频无法播放。本来用的是chrome浏览器,重装浏览器,开启cookie(压根没关过)都试过了没用。然而即使换成IE也不行,所以可以排除是chrome插件的问题。当时咨询了优酷客服也没能解决。</p><p>时隔半年,偶然想起又去鼓捣了一下,我的问题终于解决了。是由于之前使用老D博客提供的hosts翻墙,将windows系统的host替换掉了。然而hosts经常失效,我这怕麻烦的人就觉得太麻烦了,于是参考网上相关教程自己搭了梯子,随后就抛弃hosts了。但是忘记将hosts替换回去,后来hosts失效后导致这种奇怪的现象。发现这个问题后使用老D博客提供的工具恢复host就好了!</p>]]></content>
<categories>
<category> 电脑 </category>
</categories>
<tags>
<tag> VPN </tag>
</tags>
</entry>
<entry>
<title>6.5 拆铁路</title>
<link href="2018/01/13/p39.html"/>
<url>2018/01/13/p39.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>小<span class="math inline">\(X\)</span>同学最近在玩一款造桥游戏。游戏由<span class="math inline">\(N\)</span>个岛屿组成,岛屿 1 为首都,岛屿之间已有<span class="math inline">\(M\)</span>条公路和<span class="math inline">\(K\)</span>座桥相连(公路与桥均为双向),使得<span class="math inline">\(N\)</span>个岛屿互相可达,每座桥都直接连接首都与某个岛屿<span class="math inline">\(s\)</span>。但小<span class="math inline">\(X\)</span>同学为了节省经费(中饱私囊)决定拆除一些桥,但保证任意岛屿到首都的最短路距离不变。小<span class="math inline">\(X\)</span>同学想知道最多能拆掉多少座桥。</p><h1 id="数据输入">数据输入</h1><p>输入的第一行为包括三个整数<span class="math inline">\(n\ (2\leq n\leq 10^4),\ m\ (1\leq m\leq 2\times 10^4),\ k\ (1\leq k\leq 10^4)\)</span>分别表示岛屿数,公路数与桥的数量。接下来<span class="math inline">\(m\)</span>行,每行三个整数<span class="math inline">\(u_i,\ v_i,\ x_i\ (1\leq u_i,\ v_i\leq n;\ u_i\neq v_i;\ 1\leq x_i\leq 10^9)\)</span> 表示第<span class="math inline">\(i\)</span>条公路连接<span class="math inline">\(u_i\)</span>岛屿与<span class="math inline">\(v_i\)</span>岛屿,长为<span class="math inline">\(x_i\)</span>。接下来<span class="math inline">\(k\)</span>行,每行两个整数<span class="math inline">\(s_i, y_i\ (2\leq s_i\leq n;\ 1\leq y_i\leq 10^9)\)</span>表示第<span class="math inline">\(i\)</span>座桥连接首都与<span class="math inline">\(s_i\)</span>岛屿,长度为<span class="math inline">\(y_i\)</span>。</p><h1 id="数据输出">数据输出</h1><p>输出一个整数表示最多可拆除多少座桥,使得首都到其他任意岛屿的最短路不变。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>5 5 3<br>1 2 1<br>2 3 2<br>1 3 3<br>3 4 4<br>1 5 5<br>3 5<br>4 5<br>5 5</td><td>2</td></tr><tr class="even"><td>2 2 3<br>1 2 2<br>2 1 3<br>2 1<br>2 2<br>2 3</td><td>2</td></tr></tbody></table><h1 id="源代码">源代码</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><limits.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><queue></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="keyword">typedef</span> __int64 ll;</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> fi first</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> se second</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> mp make_pair</span></span><br><span class="line"><span class="keyword">const</span> <span class="keyword">int</span> N = <span class="number">10007</span>;</span><br><span class="line"><span class="keyword">const</span> <span class="keyword">int</span> M = <span class="number">20007</span>;</span><br><span class="line"><span class="keyword">int</span> et, head[N], brid[N];</span><br><span class="line">ll dis[N];</span><br><span class="line"><span class="keyword">bool</span> vis[N], used[N];</span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Edge</span> {</span></span><br><span class="line"> <span class="keyword">int</span> v, w, nxt;</span><br><span class="line"> Edge() {}</span><br><span class="line"> Edge(<span class="keyword">int</span> _v, <span class="keyword">int</span> _w, <span class="keyword">int</span> _nxt) {</span><br><span class="line"> v = _v, w = _w, nxt = _nxt;</span><br><span class="line"> }</span><br><span class="line">} e[M << <span class="number">1</span>];</span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">addEdge</span><span class="params">(<span class="keyword">int</span> u, <span class="keyword">int</span> v, <span class="keyword">int</span> w)</span> </span>{</span><br><span class="line"> e[et] = Edge(v, w, head[u]), head[u] = et++;</span><br><span class="line"> e[et] = Edge(u, w, head[v]), head[v] = et++;</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">bfs</span><span class="params">(<span class="keyword">int</span> n)</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> i;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n;++i) dis[i] = <span class="number">1e17</span>,</span><br><span class="line"> vis[i] = used[i] = <span class="literal">false</span>;</span><br><span class="line"> priority_queue<pair<pair<ll, <span class="keyword">int</span>>, <span class="keyword">int</span>> > que;</span><br><span class="line"> que.push(mp(mp(<span class="number">0</span>, <span class="number">0</span>), <span class="number">1</span>));</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n;++i)</span><br><span class="line"> <span class="keyword">if</span> (~brid[i])</span><br><span class="line"> que.push(mp(mp(-brid[i], -i), i));</span><br><span class="line"> <span class="keyword">while</span> (!que.empty()) {</span><br><span class="line"> ll D = -que.top().fi.fi;</span><br><span class="line"> <span class="keyword">int</span> src = -que.top().fi.se;</span><br><span class="line"> <span class="keyword">int</span> u = que.top().se;</span><br><span class="line"> que.pop();</span><br><span class="line"> <span class="keyword">if</span> (vis[u]) <span class="keyword">continue</span>;</span><br><span class="line"> vis[u] = <span class="literal">true</span>, used[src] = <span class="literal">true</span>, dis[u] = D;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = head[u]; ~i; i = e[i].nxt) {</span><br><span class="line"> <span class="keyword">int</span> v = e[i].v;</span><br><span class="line"> <span class="keyword">if</span> (dis[v] > dis[u] + e[i].w) {</span><br><span class="line"> dis[v] = dis[u] + e[i].w;</span><br><span class="line"> que.push(mp(mp(-dis[v], -src), v));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> n, m, k, i;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d%d"</span>, &n, &m, &k);</span><br><span class="line"> et = <span class="number">0</span>, <span class="built_in">memset</span>(head + <span class="number">1</span>, <span class="number">-1</span>, <span class="keyword">sizeof</span>(*head) * n);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < m;++i){</span><br><span class="line"> <span class="keyword">int</span> _ui, _vi, _xi;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d%d"</span>, &_ui, &_vi, &_xi);</span><br><span class="line"> addEdge(_ui, _vi, _xi);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">memset</span>(brid + <span class="number">1</span>, <span class="number">-1</span>, <span class="keyword">sizeof</span>(*brid) * n);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < k;++i){</span><br><span class="line"> <span class="keyword">int</span> _si, _yi;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &_si, &_yi);</span><br><span class="line"> <span class="keyword">if</span> (brid[_si] == <span class="number">-1</span> || brid[_si] > _yi)</span><br><span class="line"> brid[_si] = _yi;</span><br><span class="line"> }</span><br><span class="line"> bfs(n);</span><br><span class="line"> <span class="keyword">int</span> ans = k;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n;++i)</span><br><span class="line"> <span class="keyword">if</span> (~brid[i] && used[i])</span><br><span class="line"> --ans;</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d"</span>, ans);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>如果在不使用桥的情况下,首都到岛屿<span class="math inline">\(i\)</span>为最短距离,则敲<span class="math inline">\(1\leftrightarrow i\)</span>不需要使用(如果存在)。但是使用一条桥可能会更新多个点的距离,所以不能通过先先处理路再处理桥的方法。</p><p>暴力做法:每次去掉一座桥,计算首都1到其它节点的最短距离,如果存在一个节点不是最短距离,说明此桥不可去除,复杂度<span class="math inline">\(O(knlogn)\)</span>。</p><p>对于一座桥是否使用,主要取决于使用其它桥和公路的情况下的距离是否会比该桥的距离短。这可以看成一个竞争过程,谁先到达<span class="math inline">\(i\)</span>谁就占据改点。于是可以使用Dijkstra算法:每次去除距离最短且未访问过的点进行邻近的点的距离更新。处理过程中需要积累每个点由谁占据,用来判断最后使用了那些桥。如此复杂度即为Dij算法的复杂度<span class="math inline">\(O(nlogn)\)</span>。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> bfs </tag>
<tag> Dijkstra算法 </tag>
<tag> 带标记的单源最短路 </tag>
</tags>
</entry>
<entry>
<title>6.4 神奇宝贝</title>
<link href="2018/01/13/p38.html"/>
<url>2018/01/13/p38.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>小<span class="math inline">\(X\)</span>同学最近玩起了复古的神奇宝贝游戏。游戏地图为一个<span class="math inline">\(n\times m\)</span>的方格组成,地图中有如下一些表示:</p><p><strong>T:</strong> 该方格被树木占据,不可穿过;</p><p><strong>S:</strong> 小<span class="math inline">\(X\)</span>同学的起始位置,保证图中只包含一个 S;</p><p><strong>E:</strong> 地图的出口,保证图中只包含一个 E;</p><p><strong>0~9:</strong> 表示地图该位置存在<span class="math inline">\(i\)</span>个其它玩家。</p><p>游戏规则如下:每次操作所有玩家可以不行动或行动到相邻的可移动方格中。若小<span class="math inline">\(X\)</span>与其他玩家在同一方格相遇,他就需要与该方格内所有其他玩家分别完成一次决斗。由于小<span class="math inline">\(X\)</span>在这个服务器很出名,其他玩家都会尽一切可能遇上他并进行决斗。当小<span class="math inline">\(X\)</span>走到地图出口时结束游戏(若在出口处相遇,仍需进行完决斗后出地图)。小<span class="math inline">\(X\)</span>想知道他最少需要参加多少次决斗。</p><h1 id="数据输入">数据输入</h1><p>输入的第一行为包括两个整数<span class="math inline">\(n\)</span>,<span class="math inline">\(m\ (1\leq n,m\leq 1000)\)</span>表示地图的大小。接下来<span class="math inline">\(n\)</span>行,每行<span class="math inline">\(m\)</span>个字符表示地图,包含字符如题所述。</p><h1 id="数据输出">数据输出</h1><p>输出一个整数表示小 X 同学在游戏中需要参加的最少决斗次数。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>5 7<br>000E0T3<br>T0TT0T0<br>010T0T0<br>2T0T0T0<br>0T0S000</td><td>3</td></tr><tr class="even"><td>1 4<br>SE23</td><td>2</td></tr></tbody></table><h1 id="源代码">源代码</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><queue></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 1005</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Position</span>{</span></span><br><span class="line"> <span class="keyword">int</span> x, y;</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="keyword">const</span> <span class="keyword">int</span> <span class="title">getx</span><span class="params">()</span></span>{ <span class="keyword">return</span> x; }</span><br><span class="line"> <span class="function"><span class="keyword">const</span> <span class="keyword">int</span> <span class="title">gety</span><span class="params">()</span></span>{ <span class="keyword">return</span> y; }</span><br><span class="line"> Position(<span class="keyword">int</span> _x, <span class="keyword">int</span> _y) :x(_x), y(_y){}</span><br><span class="line"> Position() :x(<span class="number">0</span>), y(<span class="number">0</span>){}</span><br><span class="line"> <span class="keyword">bool</span> <span class="keyword">operator</span>==(Position t){</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">this</span>->x == t.x&&<span class="keyword">this</span>->y == t.y;</span><br><span class="line"> }</span><br><span class="line"> Position <span class="keyword">operator</span>+(Position t){</span><br><span class="line"> <span class="keyword">return</span> Position(<span class="keyword">this</span>->x + t.x, <span class="keyword">this</span>->y + t.y);</span><br><span class="line"> }</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line">Position offset[<span class="number">4</span>] = { Position(<span class="number">0</span>, <span class="number">1</span>), Position(<span class="number">1</span>, <span class="number">0</span>), Position(<span class="number">0</span>, <span class="number">-1</span>), Position(<span class="number">-1</span>, <span class="number">0</span>) };</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Map</span>{</span></span><br><span class="line"> <span class="keyword">int</span> <span class="built_in">map</span>[maxn][maxn];</span><br><span class="line"> <span class="keyword">int</span> num[maxn][maxn];</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> Map(){</span><br><span class="line"> <span class="built_in">memset</span>(<span class="built_in">map</span>, <span class="number">0</span>, <span class="keyword">sizeof</span>(<span class="built_in">map</span>));</span><br><span class="line"> <span class="built_in">memset</span>(num, <span class="number">0</span>, <span class="keyword">sizeof</span>(num));</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">bool</span> <span class="title">equal</span><span class="params">(Position pos, <span class="keyword">int</span> val)</span></span>{</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">map</span>[pos.getx()][pos.gety()] == val;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">void</span> <span class="title">setNum</span><span class="params">(Position pos, <span class="keyword">int</span> val)</span></span>{</span><br><span class="line"> num[pos.getx()][pos.gety()] = val;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">void</span> <span class="title">setMap</span><span class="params">(Position pos, <span class="keyword">int</span> val)</span></span>{</span><br><span class="line"> <span class="built_in">map</span>[pos.getx()][pos.gety()] = val;</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">getMap</span><span class="params">(Position pos)</span></span>{</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">map</span>[pos.getx()][pos.gety()];</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">getNum</span><span class="params">(Position pos)</span></span>{</span><br><span class="line"> <span class="keyword">return</span> num[pos.getx()][pos.gety()];</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">findPath</span><span class="params">(Position start, Position finish)</span></span>{</span><br><span class="line"> <span class="comment">//参考教材//if (start == finish){ return 0;}//按题设条件不可能存在这个情况</span></span><br><span class="line"> Position here = start, nbr;</span><br><span class="line"> setMap(start, <span class="number">2</span>);</span><br><span class="line"> <span class="built_in">queue</span><Position> myqueue;</span><br><span class="line"> myqueue.push(start);</span><br><span class="line"> <span class="keyword">int</span> maxLen = <span class="number">0x3f3f3f3f</span>;</span><br><span class="line"> <span class="keyword">int</span> total = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">int</span> len;</span><br><span class="line"> <span class="keyword">while</span> (!myqueue.empty()){</span><br><span class="line"> here = myqueue.front();</span><br><span class="line"> myqueue.pop();</span><br><span class="line"> len = getMap(here);</span><br><span class="line"> total += getNum(here);</span><br><span class="line"> <span class="keyword">if</span> (len >= maxLen)<span class="keyword">continue</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < <span class="number">4</span>; ++i){</span><br><span class="line"> nbr = here + offset[i];</span><br><span class="line"> <span class="keyword">if</span> (equal(nbr, <span class="number">0</span>) && len <= maxLen){</span><br><span class="line"> setMap(nbr, len + <span class="number">1</span>);</span><br><span class="line"> <span class="keyword">if</span> (nbr == finish){</span><br><span class="line"> maxLen = len + <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> myqueue.push(nbr);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> total;</span><br><span class="line"> }</span><br><span class="line">}mymap;</span><br><span class="line"></span><br><span class="line"><span class="comment">//复杂度为:n*m</span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">int</span> n, m, i, j;</span><br><span class="line"> <span class="keyword">char</span> str;</span><br><span class="line"> Position S, E;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d\n"</span>, &n, &m);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i){</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">1</span>; j <= m; ++j){</span><br><span class="line"> str = getchar();</span><br><span class="line"> <span class="keyword">switch</span> (str)</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'E'</span>: E = Position(i, j); <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'S'</span>: S = Position(i, j); <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'T'</span>: mymap.setMap(Position(i, j), <span class="number">1</span>); <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'0'</span>: <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:mymap.setNum(Position(i, j), str - <span class="string">'0'</span>); <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> getchar();</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">//设置围墙</span></span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i <= n + <span class="number">1</span>; ++i){</span><br><span class="line"> mymap.setMap(Position(i, <span class="number">0</span>), <span class="number">1</span>);</span><br><span class="line"> mymap.setMap(Position(i, m + <span class="number">1</span>), <span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">0</span>; j <= m + <span class="number">1</span>; ++j){</span><br><span class="line"> mymap.setMap(Position(<span class="number">0</span>, j), <span class="number">1</span>);</span><br><span class="line"> mymap.setMap(Position(n + <span class="number">1</span>, j), <span class="number">1</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, mymap.findPath(E, S));</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>找到小x的最短路长度,小x一定要走最短路才遭遇最少玩家。因为,如果最短路上能遭遇的玩家,你打算不走最短路去避开他,那么他是可以直接去终点等着你的!然后再考虑逆向思维,所有玩家从终点出发,在小x之前或同时到达自己位置的需要与小x决斗。最终归结为电路布线问题,详细请参考教材。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> bfs </tag>
<tag> 电路布线问题 </tag>
</tags>
</entry>
<entry>
<title>6.3 生日Party</title>
<link href="2018/01/13/p37.html"/>
<url>2018/01/13/p37.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>在 11 月 11 日生日的小<span class="math inline">\(X\)</span>同学刚刚过完了他的生日,朋友们在他家开心地举办了盛大的 party 为他庆生。</p><p>Party 结束后,留下了<span class="math inline">\(n\)</span>瓶未喝完的饮料,第<span class="math inline">\(i\)</span>瓶饮料的容积为<span class="math inline">\(b_i\)</span>,剩余饮料体积为<span class="math inline">\(a_i\)</span>。小<span class="math inline">\(X\)</span>同学十分喜欢这种饮料,希望把剩下的饮料都整理起来慢慢喝,但是家里的冰箱太小了,没有足够多的位置放瓶子,所以他希望用最少的瓶子装完所有剩下的饮料(饮料的体积不超过瓶子的容积)。同时,从一个瓶子倒体积<span class="math inline">\(v\)</span>的饮料到另一个瓶子需要花费<span class="math inline">\(v\)</span>单位的时间,小<span class="math inline">\(X\)</span>急着去看电视剧,他想知道在使用最少瓶子的前提下,最少需要花费多少单位时间能把所有饮料倒到这些瓶子里?</p><h1 id="数据输入">数据输入</h1><p>输入的第一行为包括一个整数<span class="math inline">\(n\ (1\leq n\leq 100)\)</span>表示剩余饮料的瓶数。接下来一行,<span class="math inline">\(n\)</span>个整数<span class="math inline">\(a_i\ (1\leq a_i\leq 100)\)</span>表示<span class="math inline">\(n\)</span>瓶饮料所剩体积。最后一行,<span class="math inline">\(n\)</span>个整数<span class="math inline">\(b_i\ (1\leq b_i\leq 100)\)</span>表示<span class="math inline">\(n\)</span>瓶饮料瓶子容积。</p><h1 id="数据输出">数据输出</h1><p>输出两个整数分别表示所需的最少瓶子数和倒饮料所需最少时间。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>4<br>3 3 4 3<br>4 7 6 5</td><td>2 6</td></tr><tr class="even"><td>2<br>1 1<br>100 100</td><td>1 1</td></tr></tbody></table><h1 id="源代码分支限界法">源代码(分支限界法)</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string"><iostream></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string"><cstdio></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string"><cstring></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string"><algorithm></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span><span class="meta-string"><queue></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 105</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">max</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span>{ <span class="keyword">return</span> a > b ? a : b; }</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Node</span> {</span></span><br><span class="line"> <span class="keyword">int</span> a, b;</span><br><span class="line"> <span class="keyword">bool</span> <span class="keyword">operator</span> < (<span class="keyword">const</span> Node& x) <span class="keyword">const</span> {</span><br><span class="line"> <span class="keyword">return</span> b<x.b;</span><br><span class="line"> }</span><br><span class="line">} ns[maxn];</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">Heap</span> {</span></span><br><span class="line"> <span class="keyword">int</span> cur, val, vol, cnt, up;</span><br><span class="line"> Heap(<span class="keyword">int</span> cur, <span class="keyword">int</span> val, <span class="keyword">int</span> vol, <span class="keyword">int</span> cnt, <span class="keyword">int</span> up)</span><br><span class="line"> :cur(cur), val(val), vol(vol), cnt(cnt), up(up) {}</span><br><span class="line"> <span class="keyword">bool</span> <span class="keyword">operator</span> < (<span class="keyword">const</span> Heap& hp) <span class="keyword">const</span> {</span><br><span class="line"> <span class="keyword">return</span> up<hp.up || up == hp.up&&cnt<hp.cnt;</span><br><span class="line"> }</span><br><span class="line"> Heap() {}</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> n;</span><br><span class="line"><span class="keyword">int</span> sumv[maxn], dp[maxn][maxn];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> i, j, suma = <span class="number">0</span>, sumb = <span class="number">0</span>;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &n);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i) {</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &ns[i].a);</span><br><span class="line"> suma += ns[i].a;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++) <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &ns[i].b);</span><br><span class="line"> sort(ns + <span class="number">1</span>, ns + n + <span class="number">1</span>);</span><br><span class="line"> <span class="keyword">for</span> (i = n; i >= <span class="number">1</span> && suma>sumb; --i) {</span><br><span class="line"> sumb += ns[i].b;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">//k代表最少瓶数</span></span><br><span class="line"> <span class="keyword">int</span> k = n - i;</span><br><span class="line"> sumv[<span class="number">0</span>] = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++) sumv[i] = sumv[i - <span class="number">1</span>] + ns[i].b;</span><br><span class="line"> <span class="built_in">memset</span>(dp, <span class="number">-1</span>, <span class="keyword">sizeof</span>(dp));</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i <= n; i++) {</span><br><span class="line"> dp[i][<span class="number">0</span>] = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">1</span>; j <= i; j++) {</span><br><span class="line"> dp[i][j] = max(dp[i - <span class="number">1</span>][j - <span class="number">1</span>] + ns[i].a, dp[i - <span class="number">1</span>][j]);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">int</span> res = <span class="number">0</span>;</span><br><span class="line"> priority_queue<Heap> pq;</span><br><span class="line"> pq.push(Heap(n, <span class="number">0</span>, <span class="number">0</span>, <span class="number">0</span>, dp[n][k]));</span><br><span class="line"> <span class="keyword">while</span> (!pq.empty()) {</span><br><span class="line"> Heap u = pq.top();</span><br><span class="line"> pq.pop();</span><br><span class="line"> <span class="keyword">if</span> (u.cnt == k){</span><br><span class="line"> res = u.up;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="comment">//没儿子</span></span><br><span class="line"> <span class="keyword">int</span> cur_vol = u.vol + sumv[u.cur] - sumv[u.cur - (k - u.cnt)];</span><br><span class="line"> <span class="keyword">int</span> lef_cnt = k - u.cnt;</span><br><span class="line"> <span class="keyword">if</span> (cur_vol<suma || lef_cnt>u.cur || u.cur <= <span class="number">0</span>) <span class="keyword">continue</span>;</span><br><span class="line"> <span class="comment">//左子树</span></span><br><span class="line"> Heap lson = Heap(u.cur - <span class="number">1</span>, u.val + ns[u.cur].a,</span><br><span class="line"> u.vol + ns[u.cur].b, u.cnt + <span class="number">1</span>,</span><br><span class="line"> u.val + ns[u.cur].a + dp[u.cur - <span class="number">1</span>][k - <span class="number">1</span> - u.cnt]);</span><br><span class="line"> pq.push(lson);</span><br><span class="line"> <span class="comment">//右子树,一个小条件提前判断一下</span></span><br><span class="line"> <span class="keyword">if</span> (u.cur - <span class="number">1</span> >= k - u.cnt){</span><br><span class="line"> Heap rson = Heap(u.cur - <span class="number">1</span>, u.val, u.vol,</span><br><span class="line"> u.cnt, u.val + dp[u.cur - <span class="number">1</span>][k - u.cnt]);</span><br><span class="line"> pq.push(rson);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d %d\n"</span>, k, suma - res);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1>题目要求用最少瓶子的前提下耗时也最少。首先确定需要瓶子个数<span class="math inline">\(k\)</span>,那么问题转化为用<span class="math inline">\(k\)</span>个瓶子装所有饮料,且转移耗时最少。等价为如下的优化问题:考虑分支限界法求最大值问题,上界定义为:<span class="math inline">\(upper\_bound=cura+h(k-curk)\)</span>,其中<span class="math inline">\(cura\)</span>表示已经选取的瓶子的饮料体积;<span class="math inline">\(curk\)</span>表示已经选取的瓶子数,<span class="math inline">\((k-curk)\)</span>表示还要选的瓶子个数;<span class="math inline">\(h(k-curk)\)</span> 表示从剩余的瓶子中取出<span class="math inline">\(k-curk\)</span>个瓶子的最大饮料体积。有如下剪枝策略:<p>预处理阶:按照瓶子体积升序排列以求得<span class="math inline">\(k\)</span>且方便剪枝。然后用运行一次动态规划算法计算<span class="math inline">\(dp(i,j)\)</span>,表示前<span class="math inline">\(i\)</span>个瓶子中,选<span class="math inline">\(j\)</span>个瓶子,能得到的最大饮料体积为计算<span class="math inline">\(upper\_bound\)</span>做准备。</p><p>优先队列实现分值限界法:从第n个瓶子开始搜,搜到第1个瓶子为止。每次找<span class="math inline">\(upper\_bound\)</span>值最大的节点扩展。</p><p>下面是样例分析:</p><p><img src="p37/6.3.png" title="6.3 题图"></p><h1 id="源代码动态规划">源代码(动态规划)</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><algorithm></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">max</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span>{ <span class="keyword">return</span> a < b ? b : a; }</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">struct</span> <span class="title">bottle</span>{</span></span><br><span class="line"> <span class="keyword">int</span> left, volume;</span><br><span class="line">}bottles[<span class="number">105</span>];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">bool</span> <span class="title">cmp</span><span class="params">(bottle a, bottle b)</span></span>{</span><br><span class="line"> <span class="keyword">if</span> (a.volume != b.volume)</span><br><span class="line"> <span class="keyword">return</span> a.volume>b.volume;</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> <span class="keyword">return</span> a.left>b.left;</span><br><span class="line">}</span><br><span class="line"><span class="keyword">int</span> dp[<span class="number">105</span>][<span class="number">10005</span>];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">int</span> n, i, j, k;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &n);</span><br><span class="line"> <span class="keyword">int</span> sum = <span class="number">0</span>;</span><br><span class="line"> <span class="built_in">memset</span>(dp, <span class="number">-1</span>, <span class="keyword">sizeof</span>(dp));</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++) {</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &bottles[i].left);</span><br><span class="line"> <span class="comment">//所剩体积</span></span><br><span class="line"> sum += bottles[i].left;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++)</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &bottles[i].volume);</span><br><span class="line"> <span class="keyword">int</span> cnt = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">int</span> total = <span class="number">0</span>;</span><br><span class="line"> sort(bottles + <span class="number">1</span>, bottles + <span class="number">1</span> + n, cmp);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++){<span class="comment">//确定使用的最少的瓶子数</span></span><br><span class="line"> total += bottles[i].volume;</span><br><span class="line"> <span class="keyword">if</span> (total >= sum){</span><br><span class="line"> cnt = i;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> dp[<span class="number">0</span>][<span class="number">0</span>] = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++){</span><br><span class="line"> <span class="keyword">for</span> (j = sum; (j - bottles[i].left) >= <span class="number">0</span>; j--){</span><br><span class="line"> <span class="keyword">for</span> (k = i; k >= <span class="number">1</span>; k--){</span><br><span class="line"> <span class="comment">//dp[i][j] 选i个瓶子其中自带j的水</span></span><br><span class="line"> <span class="keyword">if</span> (dp[k - <span class="number">1</span>][j - bottles[i].left] != <span class="number">-1</span>)</span><br><span class="line"> dp[k][j] = max(dp[k][j], dp[k - <span class="number">1</span>][j - bottles[i].left] + bottles[i].volume);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">int</span> ans = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = sum; i >= <span class="number">1</span>; i--){</span><br><span class="line"> <span class="keyword">if</span> (dp[cnt][i] >= sum){</span><br><span class="line"> ans = sum - i;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d %d\n"</span>, cnt, ans);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析-1">设计思路与复杂度分析</h1><p>同分支限界法一样首先确定瓶子个数,但是无法使用前<span class="math inline">\(cnt\)</span>容积大的瓶子来作为最后结果,因为有可能那<span class="math inline">\(cnt\)</span>个瓶子外的瓶子剩余量很小且容积较大,或者<span class="math inline">\(cnt\)</span>个瓶子内容积大却剩余少。选取的<span class="math inline">\(cnt\)</span>个瓶子要尽量满足将剩余少的瓶子往剩余多的瓶子倒饮料,则找到<span class="math inline">\(cnt\)</span>个瓶子并且他们剩余的饮料总量最大,则转移时间就最少。所以需要知道选<span class="math inline">\(cnt\)</span>个瓶子时,可以装其他瓶子的剩余饮料总量可以是多少(<span class="math inline">\(cnt\)</span>个瓶子的总容积-<span class="math inline">\(cnt\)</span>个瓶子自己剩余的总饮料量)。如此得到状态转移方程: <span class="math display">\[dp(i,j)=max(dp(i,j),dp(i-1,j-bottle_i.left)+bottle_i.volume)\]</span> 其中<span class="math inline">\(dp(0,0)=0\)</span>,<span class="math inline">\(dp(i,j)\)</span>为选取<span class="math inline">\(i\)</span>个瓶子且这<span class="math inline">\(i\)</span>个瓶子自带<span class="math inline">\(j\)</span>剩余总量的<span class="math inline">\(i\)</span>个瓶子的最大总容积。</p><p>求解出上述转移方程后即可以通过方程 <span class="math display">\[max(sum-init) \ s.t.\ dp[cnt][init] \geq sum,init = 1\cdots sum\]</span> 最终获得所需时间是<span class="math inline">\(ans=sum-init\)</span> 可以理解为求解cnt个瓶子并且自带剩余饮料总量最大。所需时间就是剩余饮料总量减去这<span class="math inline">\(cnt\)</span>个瓶子自带的剩余总量。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> 动态规划 </tag>
<tag> 分支限界法 </tag>
</tags>
</entry>
<entry>
<title>6.2 水果沙拉</title>
<link href="2018/01/13/p36.html"/>
<url>2018/01/13/p36.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>小<span class="math inline">\(X\)</span>同学最近胖了许多,准备减肥。他决定每天用水果沙拉代替三餐来节食减肥。他从超市买了<span class="math inline">\(n\)</span>种水果,每种水果有对应的美味度<span class="math inline">\(a_i\)</span>和所含的卡路里<span class="math inline">\(b_i\)</span>,他将从中选择一些水果来做水果沙拉,但根据减肥的原则,水果美味度的和要严格等于水果卡路里和的<span class="math inline">\(K\)</span>倍。为了让自己能持续减肥,小<span class="math inline">\(X\)</span>同学希望水果沙拉尽量好吃,即美味度尽量大,你能帮他吗?</p><h1 id="数据输入">数据输入</h1><p>输入的第一行为包括两个整数<span class="math inline">\(n,k\ (1\leq n\leq 100,\ 1\leq k\leq 10)\)</span>表示水果种类数和减肥原则中的<span class="math inline">\(k\)</span>值。接下来一行,<span class="math inline">\(n\)</span>个整数<span class="math inline">\(a_i\ (1\leq a_i\leq 100)\)</span>表示<span class="math inline">\(n\)</span>种水果的美味度。最后一行,<span class="math inline">\(n\)</span>个整数<span class="math inline">\(b_i\ (1\leq b_i\leq 100)\)</span>表示<span class="math inline">\(n\)</span>种水果的卡路里。</p><h1 id="数据输出">数据输出</h1><p>输出满足减肥原则的前提下小<span class="math inline">\(X\)</span>同学能获得的最大美味度,若无法满足原则,则输出<span class="math inline">\(-1\)</span>。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>3 2<br>10 8 1<br>2 7 1</td><td>18</td></tr><tr class="even"><td>5 3<br>4 4 4 4 4<br>2 2 2 2 2</td><td>-1</td></tr></tbody></table><h1 id="源代码动态规划">源代码(动态规划)</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> NInf 0xfefefefe</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAX_N 105</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAX_A 105</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> MAX_V 2 * MAX_N * MAX_A</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> dp[MAX_V];</span><br><span class="line"><span class="keyword">int</span> w_a[MAX_N];</span><br><span class="line"><span class="keyword">int</span> v_b[MAX_N];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">max</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span>{ <span class="keyword">return</span> a < b ? b : a; }</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">int</span> i, j, n, k, v;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d %d"</span>, &n, &k);</span><br><span class="line"> <span class="keyword">int</span> offset = n * MAX_A;</span><br><span class="line"> v = <span class="number">2</span> * offset;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < n; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &w_a[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < n; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &v_b[i]);</span><br><span class="line"> v_b[i] = w_a[i] - k*v_b[i];</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">memset</span>(dp, NInf, <span class="keyword">sizeof</span>(dp));</span><br><span class="line"> dp[offset] = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < n; ++i){</span><br><span class="line"> <span class="keyword">if</span> (v_b[i] > <span class="number">0</span>){<span class="comment">//依赖下标更小的,所以由大至小dp</span></span><br><span class="line"> <span class="keyword">for</span> (j = v - <span class="number">1</span>; j >= v_b[i]; --j){</span><br><span class="line"> dp[j] = max(dp[j], dp[j - v_b[i]] + w_a[i]);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">else</span>{<span class="comment">//依赖下标更大的,所以由小至大dp</span></span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">0</span>; j < v + v_b[i]; ++j){<span class="comment">//@$v_i<0$@</span></span><br><span class="line"> dp[j] = max(dp[j], dp[j - v_b[i]] + w_a[i]);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, dp[offset] == <span class="number">0</span> ? <span class="number">-1</span> : dp[offset]);</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>此题类似0-1背包问题,但是加入了约束条件:<span class="math inline">\(\sum a = k\times\sum b\)</span>。自然首先考虑如何转化为背包问题,现将约束条件转化为:<span class="math inline">\(\sum a - k \times \sum b = 0\)</span>。则得到背包问题如下: <span class="math display">\[\left\{ \begin{array}{l} v_i = a_i - k\times b_i \\ w_i = a_i \\\end{array} \right.\]</span> 其状态转移方程为:<span class="math inline">\(dp(i,j)=max(dp(i-1,j),dp(i-1,j-v_i)+w_i)\)</span>。这是一个物品体积为<span class="math inline">\(v_i\)</span>,而价值为<span class="math inline">\(w_i\)</span>的0-1背包问题。与标准0-1背包问题的不同之处在于体积可能为负,所以需要加一个offset。而且最后要求<span class="math inline">\(v=0\)</span>,所以输出的是<span class="math inline">\(dp(n,offset)\)</span>。</p><p>可将offset设置为<span class="math inline">\(maxa\times n\)</span>,证明如下。当a极大,b极小时,得到下标的上界为<span class="math inline">\(maxa \times n\)</span>。当a极小,b极大时得到下标的下界为<span class="math inline">\(-k\times maxb\times n\)</span>。那么简单的将offset设置为<span class="math inline">\(-k\times maxb\times n\)</span> 也是可以的,但是内存负担和计算负担均很大。其实如果一旦有前面选取的物品体积和小于<span class="math inline">\(-maxa\times n\)</span>,那么后面即使所有<span class="math inline">\(a\)</span>加起来也不可能改变总体积为负的局面,所以<span class="math inline">\(dp\)</span> 数组<span class="math inline">\(-maxb\times n \sim -maxa\times n\)</span>段无实际意义,可以省略。即offset设置为<span class="math inline">\(maxa\times n\)</span>。</p><p>由于<span class="math inline">\(v_i\)</span>符号不确定,所以将公式中二维数组优化为一维数组的时候需要额外考虑处理方向问题,详见备注。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> 背包问题 </tag>
</tags>
</entry>
<entry>
<title>6.1 屠龙宝镜</title>
<link href="2018/01/13/p35.html"/>
<url>2018/01/13/p35.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>小<span class="math inline">\(X\)</span>同学最近沉迷页游——屠龙宝镜。游戏在一个<span class="math inline">\(N\times M\)</span>的地图上进行,地图上有一些点设置有屠龙宝镜,玩家在地图的左上角,恶龙在地图的右下角。玩家在地图的左上角向右通过魔法发射一束能杀死恶龙的光线,光线通过地图直线传播,若玩家选择使用当前位置光线经过的宝镜,则宝镜会把光线朝上、下、 左、 右四个方向反射。但是每使用一个屠龙宝镜要消耗一定量的魔法值,魔法值的获取十分困难,因此小<span class="math inline">\(X\)</span>同学希望利用最少的屠龙宝镜杀死恶龙。但是随着地图的增大,计算变得越来越困难,小<span class="math inline">\(X\)</span>同学觉得游戏体验很差,你能帮帮他吗?</p><h1 id="数据输入">数据输入</h1><p>输入的第一行为包括两个整数<span class="math inline">\(n,m\ (1\leq n, m\leq 1000)\)</span>表示游戏地图的大小。接下来<span class="math inline">\(n\)</span>行,每行包括<span class="math inline">\(m\)</span>个字符用来表示地图。‘.’ 字符表示该地图位置为空地,‘#’ 字符表示该地图位置设置有屠龙宝镜。30%的数据<span class="math inline">\(n\times m\leq 30\)</span>,70%的数据<span class="math inline">\(n\times m\leq 100\)</span>,100%的数据<span class="math inline">\(n\times m\leq 1000\)</span>。</p><h1 id="数据输出">数据输出</h1><p>若能杀死恶龙,输出需要使用的最少屠龙宝镜,否则输出-1。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>3 3<br>.#.<br>...<br>.#.</td><td>2</td></tr><tr class="even"><td>4 4<br>##..<br>..#.<br>...#<br>...#</td><td>-1</td></tr></tbody></table><h1 id="提示">提示</h1><p><img src="p35/6.1.jpg" title="6.1 题图"> # 源代码</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><vector></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><queue></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> Inf 0x3f3f3f3f</span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 1005</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> cnt[maxn << <span class="number">1</span>];</span><br><span class="line"><span class="keyword">bool</span> flag[maxn << <span class="number">1</span>];</span><br><span class="line"><span class="built_in">vector</span><<span class="keyword">int</span>> tree[maxn << <span class="number">1</span>];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">min</span><span class="params">(<span class="keyword">int</span> x, <span class="keyword">int</span> y)</span></span>{ <span class="keyword">return</span> x < y ? x : y; }</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">bfs</span><span class="params">(<span class="keyword">int</span> idx, <span class="keyword">int</span> n)</span></span>{</span><br><span class="line"> <span class="built_in">queue</span><<span class="keyword">int</span>> myqueue;</span><br><span class="line"> <span class="built_in">memset</span>(cnt, Inf, <span class="keyword">sizeof</span>(cnt));</span><br><span class="line"> <span class="built_in">memset</span>(flag, <span class="literal">false</span>, <span class="keyword">sizeof</span>(flag));</span><br><span class="line"> cnt[<span class="number">1</span>] = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">int</span> i, father, son;</span><br><span class="line"> myqueue.push(idx);</span><br><span class="line"> <span class="keyword">while</span> (!myqueue.empty()){</span><br><span class="line"> father = myqueue.front();</span><br><span class="line"> myqueue.pop();</span><br><span class="line"> flag[father] = <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i<tree[father].size(); i++){</span><br><span class="line"> son = tree[father][i];</span><br><span class="line"> <span class="keyword">if</span> (!flag[son]){</span><br><span class="line"> myqueue.push(son);</span><br><span class="line"> cnt[son] = min(cnt[father] + <span class="number">1</span>, cnt[son]);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> cnt[n] == Inf ? <span class="number">-1</span> : cnt[n];</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">int</span> n, m, i, j;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d\n"</span>, &n, &m);</span><br><span class="line"> <span class="comment">//行节点从1到n,列节点从n+1到n+m</span></span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++){</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">1</span>; j <= m; j++){</span><br><span class="line"> <span class="keyword">char</span> c = getchar();</span><br><span class="line"> <span class="keyword">if</span> (c == <span class="string">'#'</span>){</span><br><span class="line"> tree[i].push_back(n + j);</span><br><span class="line"> tree[n + j].push_back(i);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> getchar();</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, bfs(<span class="number">1</span>, n));</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>可以把地图的每一行每一列看成是一个点,存在宝镜的地方就把它看成是某行的点到某列的边。这个时候就变成了从起点开始寻找到终点的最短路径。</p><p>建图:<span class="math inline">\(T_1(n)=O(n\times m)\)</span>;搜索路径:<span class="math inline">\(T_2(n)=O(m+n)\)</span>;综上所述,时间复杂度:<span class="math inline">\(T(n)=T_1+T_2=O(m\times n)\)</span>。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> bfs </tag>
<tag> 分支限界法 </tag>
</tags>
</entry>
<entry>
<title>5.5 最大字典序</title>
<link href="2018/01/13/p34.html"/>
<url>2018/01/13/p34.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>给定一个<span class="math inline">\(1\sim N\)</span>的全排列,给出<span class="math inline">\(M\)</span>对可以交换的位置<span class="math inline">\((A_i,B_i)\)</span>。每一次,你可以选择一对位置,交换其位置上的数值。不限制操作次数,请问经过交换后,字典序最大的全排列是多少(字典序的定义请参考)。</p><h1 id="数据输入">数据输入</h1><p>第一行输入包括两个整数,<span class="math inline">\(N\)</span>和<span class="math inline">\(M\ (1\leq N,M\leq 20000)\)</span>。第二行输入包括<span class="math inline">\(N\)</span>个整数,表示<span class="math inline">\(1\sim N\)</span>的全排列。接下去<span class="math inline">\(M\)</span>行,每行包括两个整数<span class="math inline">\(A_i,B_i\ (1\leq A_i,B_i\leq N,A_i\neq B_i)\)</span>,表示序列中<span class="math inline">\(A_i\)</span>和<span class="math inline">\(B_i\)</span>位置上的数可以交换。</p><h1 id="数据输出">数据输出</h1><p>出入<span class="math inline">\(N\)</span>个整数,表示最大字典序的序列。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>5 3<br>1 5 4 2 3<br>1 3<br>3 5<br>2 4</td><td>4 5 3 2 1</td></tr></tbody></table><h1 id="源代码dfs">源代码(dfs)</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><vector></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><algorithm></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 20005</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> data[maxn];</span><br><span class="line"><span class="built_in">vector</span><<span class="keyword">int</span>> change[maxn];</span><br><span class="line"><span class="keyword">int</span> <span class="built_in">set</span>[maxn];</span><br><span class="line"><span class="keyword">int</span> index[maxn];</span><br><span class="line"><span class="keyword">bool</span> used[maxn];</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">bool</span> <span class="title">cmp</span><span class="params">(<span class="keyword">const</span> <span class="keyword">int</span> &a, <span class="keyword">const</span> <span class="keyword">int</span> &b)</span></span>{</span><br><span class="line"> <span class="keyword">return</span> a > b;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">dfs</span><span class="params">(<span class="keyword">int</span> i, <span class="keyword">int</span> &len)</span></span>{</span><br><span class="line"> used[i] = <span class="literal">true</span>;</span><br><span class="line"> index[len] = i;</span><br><span class="line"> <span class="built_in">set</span>[len++] = data[i];</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> j = <span class="number">0</span>; j < change[i].size(); ++j){</span><br><span class="line"> <span class="keyword">if</span> (!used[change[i][j]]){</span><br><span class="line"> dfs(change[i][j], len);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">int</span> n, m, i, j, u, v;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &n, &m);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, data + i);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < m; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &u, &v);</span><br><span class="line"> change[u].push_back(v);</span><br><span class="line"> change[v].push_back(u);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">memset</span>(used, <span class="literal">false</span>, <span class="keyword">sizeof</span>(used));</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i){</span><br><span class="line"> <span class="keyword">if</span> (!used[i]){</span><br><span class="line"> <span class="keyword">int</span> len = <span class="number">0</span>;</span><br><span class="line"> dfs(i, len);</span><br><span class="line"> sort(<span class="built_in">set</span>, <span class="built_in">set</span> + len, cmp);</span><br><span class="line"> sort(index, index + len);</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">0</span>; j < len; ++j){</span><br><span class="line"> data[index[j]] = <span class="built_in">set</span>[j];</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d"</span>, data[<span class="number">1</span>]);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">2</span>; i <= n; ++i){</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">" %d"</span>, data[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"\n"</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="源代码并查集">源代码(并查集)</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><string.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><vector></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><algorithm></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 20005</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">int</span> data[maxn];</span><br><span class="line"><span class="built_in">vector</span><<span class="keyword">int</span>> pos[maxn], vec[maxn];</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">UFset</span> {</span></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">int</span> root[maxn];</span><br><span class="line"> <span class="keyword">int</span> rank[maxn];</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">init</span><span class="params">(<span class="keyword">int</span> n)</span> </span>{</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i <= n; ++i) {</span><br><span class="line"> root[i] = i;</span><br><span class="line"> rank[i] = <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">Find</span><span class="params">(<span class="keyword">int</span> v)</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> root[v] = root[v] == v ? v : Find(root[v]);</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Union</span><span class="params">(<span class="keyword">int</span> x, <span class="keyword">int</span> y)</span></span>{</span><br><span class="line"> <span class="keyword">int</span> rx = Find(x);</span><br><span class="line"> <span class="keyword">int</span> ry = Find(y);</span><br><span class="line"> <span class="keyword">if</span> (rx == ry)<span class="keyword">return</span>;<span class="comment">//已合并返回</span></span><br><span class="line"> <span class="comment">//未优化</span></span><br><span class="line"> <span class="comment">//root[rx] = ry;</span></span><br><span class="line"> <span class="comment">//return;</span></span><br><span class="line"> <span class="comment">//优化</span></span><br><span class="line"> <span class="keyword">if</span> (rank[rx] < rank[ry]){</span><br><span class="line"> root[rx] = ry;<span class="comment">//把x的祖先rx合并到y的祖先ry上。因以ry为根的树更高</span></span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">else</span>{</span><br><span class="line"> root[ry] = rx;</span><br><span class="line"> <span class="keyword">if</span> (rank[rx] == rank[ry]){</span><br><span class="line"> ++rank[rx];<span class="comment">//若两树一样高,那么合并后,高度加一</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">} ufset;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">bool</span> <span class="title">cmp</span><span class="params">(<span class="keyword">const</span> <span class="keyword">int</span> &a, <span class="keyword">const</span> <span class="keyword">int</span> &b)</span> </span>{</span><br><span class="line"> <span class="keyword">return</span> a > b;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> n, m, i, j, u, v;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &n, &m);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i) {</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &data[i]);</span><br><span class="line"> }</span><br><span class="line"> ufset.init(n);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < m; ++i) {</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &u, &v);</span><br><span class="line"> ufset.Union(u, v);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++){</span><br><span class="line"> <span class="keyword">int</span> fa = ufset.Find(i);</span><br><span class="line"> vec[fa].push_back(data[i]);</span><br><span class="line"> pos[fa].push_back(i);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; i++){</span><br><span class="line"> sort(vec[i].begin(), vec[i].end(), cmp);</span><br><span class="line"> <span class="keyword">for</span> (j = <span class="number">0</span>; j<pos[i].size(); j++){</span><br><span class="line"> data[pos[i][j]] = vec[i][j];</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d"</span>, data[<span class="number">1</span>]);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">2</span>; i <= n; ++i) {</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">" %d"</span>, data[i]);</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"\n"</span>);</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>位置A和B可以互换,位置B和C可以互换,则位置A和C也可以互换。先将可以互相交换的数都放入同一个集合,然后将同一个集合的数进行逆序排序,放入原来的集合对应的位置即可得到解。</p><p>查找集合的方法就不唯一了:</p><ol type="1"><li>将序列看做深林,每一个交换集看做一棵树,则使用<span class="math inline">\(dfs\)</span>或者<span class="math inline">\(bfs\)</span>遍历即可得到每一颗树;</li><li>既然是集合问题,自然可以使用并查集实现。</li></ol><p>以上算法都可以在<span class="math inline">\(O(n\times logn)\)</span>时间复杂度内实现。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> 回溯法 </tag>
<tag> dfs </tag>
<tag> 并查集 </tag>
</tags>
</entry>
<entry>
<title>5.4 统治世界</title>
<link href="2018/01/13/p33.html"/>
<url>2018/01/13/p33.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p>大陆上有<span class="math inline">\(N\)</span>个国家,由<span class="math inline">\(N-1\)</span>条道路连接,保证两两连通。我们将国家分布看成一个树形结构,以标号为 1 的国家为根节点。每个国家都有一个综合实力<span class="math inline">\(A_i\)</span>,每条道路上有一个长度<span class="math inline">\(W_i\)</span>。现在,如果国家<span class="math inline">\(V\)</span>可以攻打国家<span class="math inline">\(U\)</span>,当且仅当在树形结构图中,<span class="math inline">\(U\)</span>在<span class="math inline">\(V\)</span>的子树中,且<span class="math inline">\(dist(v,u)\leq A_u\ (V\neq U)\)</span>,<span class="math inline">\(dist(v,u)\)</span>表示<span class="math inline">\(u\)</span>到<span class="math inline">\(v\)</span>的最短路。现在想请你帮忙计算每个国家可以攻打的国家数目。</p><h1 id="数据输入">数据输入</h1><p>第一行包含一个整数<span class="math inline">\(N\ (1\leq N\leq 20000)\)</span>,表示国家的个数。第二行包含<span class="math inline">\(N\)</span>个正整数<span class="math inline">\(A_1,A_2,\cdots , A_n\ (1\leq A_i\leq 10^9)\)</span>,表示国家的综合实力。接下去输入<span class="math inline">\(N-1\)</span>行,第<span class="math inline">\(i\)</span>行包括两个整数<span class="math inline">\(P_i\)</span>和<span class="math inline">\(W_i\ (1\leq P_i\leq N,1\leq W_i\leq 10^9)\)</span>,表示标号为<span class="math inline">\(P_i\)</span>的国家是标号为<span class="math inline">\(i+1\)</span>的国家的父节点,且两个国家之间的道路长度为<span class="math inline">\(W_i\)</span>。</p><h1 id="数据输出">数据输出</h1><p>输出为<span class="math inline">\(N\)</span>个整数,表示每个国家可以攻打的国家个数。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>5 6<br>4 1 2 6<br>1 2<br>1 2<br>2 4<br>2 5</td><td>1 1 0 0 0</td></tr></tbody></table><h1 id="源代码">源代码</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 20005</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Node</span>{</span></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">int</span> power, weight, parent, scale;</span><br><span class="line"> Node() :power(<span class="number">0</span>), weight(<span class="number">0</span>), parent(<span class="number">0</span>), scale(<span class="number">0</span>){}</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Tree</span>{</span></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">int</span> n;</span><br><span class="line"> Node node[maxn];</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">in</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &n);</span><br><span class="line"> <span class="keyword">int</span> i;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i <= n; ++i)</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &node[i].power);</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">2</span>; i <= n; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &node[i].parent, &node[i].weight);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">solve</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">2</span>; i <= n; ++i){</span><br><span class="line"> <span class="keyword">int</span> son = i;</span><br><span class="line"> <span class="keyword">int</span> power = node[son].power;</span><br><span class="line"> <span class="keyword">while</span> (son > <span class="number">1</span> && power >= node[son].weight){</span><br><span class="line"> ++node[node[son].parent].scale;</span><br><span class="line"> power -= node[son].weight;</span><br><span class="line"> son = node[son].parent;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">out</span><span class="params">()</span></span>{</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d"</span>, node[<span class="number">1</span>].scale);</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">2</span>; i <= n; ++i)</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">" %d"</span>, node[i].scale);</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"\n"</span>);</span><br><span class="line"> }</span><br><span class="line">}tree;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> tree.in();</span><br><span class="line"> tree.solve();</span><br><span class="line"> tree.out();</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>题目源于实际:地球上有若干国家,国家之间的相互制约表现为“强国” 控制弱国。控制国需要保证对其所有被控国的绝对控制力。即:若某一个被控国的势力范围达到其大本营,则控制国将要攻打该被控国,削弱其势力。而我们的任务是帮助每一个控制国求解其需要攻打的被控国数量。所以不难理解为何是:子节点的实力大于其到父节点的路径长度,则父节点攻打子节点。所以美国虽然控制整个地球,他也没有天天打仗嘛!因为他只是需要一个平衡,然后坐收渔利!</p><p>输入的数据是每一个节点的父节点、到父节点的路径长度以及自身的势力。根据输入数据,将任务转化为对于每一个节点计算其所有父节点是否可以攻击他,并将结果存储在该父节点处。所以对于每一个节点,回溯其势力范围内的祖先节点并将其攻击数加一。</p><p>平均时间复杂度<span class="math inline">\(O(n\times logn)\)</span>,最坏时间复杂度<span class="math inline">\(O(n^2)\)</span>。</p>]]></content>
<categories>
<category> 算法 </category>
</categories>
<tags>
<tag> 算法设计与分析习题集 </tag>
<tag> 回溯法 </tag>
</tags>
</entry>
<entry>
<title>5.3 最长隧道</title>
<link href="2018/01/13/p32.html"/>
<url>2018/01/13/p32.html</url>
<content type="html"><![CDATA[<h1 id="实验任务">实验任务</h1><p><span class="math inline">\(C\)</span>国有<span class="math inline">\(N\)</span>个城市,由<span class="math inline">\(N-1\)</span>条道路连接,保证可以从一个城市到达其他任意一个城市。国王挑选出<span class="math inline">\(2\times k\)</span>个重要的城市,将其分为<span class="math inline">\(k\)</span>组,在每组的两个城市间,沿着道路修筑隧道,来为战争做准备。假设所有的道路长度为1,现在帮助国王分组,使得按照该分组方案进行修筑隧道后,所有隧道的长度之和最大(两不同组若在同一段道路上修建隧道,算作两条)。</p><h1 id="数据输入">数据输入</h1><p>输入第一行包括两个整数<span class="math inline">\(N\)</span>和<span class="math inline">\(K\ (2\leq N\leq 20000,1\leq K\leq n/2)\)</span>,为城市的总个数和重要城市的组数。第二行包括<span class="math inline">\(2\times K\)</span> 个不同的整数,表示重要城市的编号。接下来<span class="math inline">\(N-1\)</span>行,每行包括两个整数<span class="math inline">\(X\)</span>和<span class="math inline">\(Y\ (1\leq X,Y\leq N)\)</span>,表示<span class="math inline">\(X\)</span>和<span class="math inline">\(Y\)</span>两个城市之间有一条道路。</p><h1 id="数据输出">数据输出</h1><p>输出整数<span class="math inline">\(K\)</span>,表示隧道的最大长度和。</p><table><thead><tr class="header"><th>输入示例</th><th>输出示例</th></tr></thead><tbody><tr class="odd"><td>7 2<br>4 5 6 7<br>1 2<br>1 3<br>2 4<br>2 5<br>3 6<br>3 7</td><td>8</td></tr></tbody></table><h1 id="源代码">源代码</h1><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><stdio.h></span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string"><vector></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> maxn 20005</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">min</span><span class="params">(<span class="keyword">int</span> a, <span class="keyword">int</span> b)</span></span>{ <span class="keyword">return</span> a < b ? a : b; }</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Node</span>{</span></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">int</span> pair;</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> son;</span><br><span class="line"> Node() :pair(<span class="number">0</span>){}</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Tree</span>{</span></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="keyword">int</span> k;</span><br><span class="line"> Node node[maxn];</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">in</span><span class="params">()</span></span></span><br><span class="line"><span class="function"> </span>{</span><br><span class="line"> <span class="keyword">int</span> n, i, u, v;</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &n, &k);</span><br><span class="line"> k <<= <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">0</span>; i < k; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d"</span>, &u);</span><br><span class="line"> tree.node[u].pair = <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (i = <span class="number">1</span>; i < n; ++i){</span><br><span class="line"> <span class="built_in">scanf</span>(<span class="string">"%d%d"</span>, &u, &v);</span><br><span class="line"> node[u].son.push_back(v);</span><br><span class="line"> node[v].son.push_back(u);</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">dfs</span><span class="params">(<span class="keyword">int</span> root, <span class="keyword">int</span> parent)</span></span>{</span><br><span class="line"> <span class="keyword">int</span> ret = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">if</span> (node[root].son.size() == <span class="number">1</span>)<span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < node[root].son.size(); ++i){</span><br><span class="line"> <span class="keyword">int</span> son = node[root].son[i];</span><br><span class="line"> <span class="keyword">if</span> (son == parent)<span class="keyword">continue</span>;</span><br><span class="line"> ret += dfs(son, root);</span><br><span class="line"> ret += min(node[son].pair, k - node[son].pair);</span><br><span class="line"> node[root].pair += node[son].pair;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> ret;</span><br><span class="line"> }</span><br><span class="line">}tree;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span>{</span><br><span class="line"> tree.in();</span><br><span class="line"> <span class="built_in">printf</span>(<span class="string">"%d\n"</span>, tree.dfs(<span class="number">1</span>, <span class="number">0</span>));</span><br><span class="line">}</span><br></pre></td></tr></table></figure><h1 id="设计思路与复杂度分析">设计思路与复杂度分析</h1><p>问题转化:与每条隧道最长等价的问题是,每一条边有尽可能多的隧道通过。</p><p>最小配对原则:假设一条边将一棵树分成了两个部分,若左边有v个要挑选的城市,则右边必有<span class="math inline">\((2\times K-v)\)</span>个要挑选城市。要使该边通过的隧道数尽可能多,则重要城市少的部分的所有重要城市必须通过该边与重要城市多的部分连接,因此该边必有<span class="math inline">\(min(2\times k-v,v)\)</span>条隧道通过。</p><p>实现上使用<span class="math inline">\(dfs\)</span>即可知道每一条边树形结构下方的重要城市数,复杂度为:<span class="math inline">\(O(n)\)</span>。</p>]]></content>
<categories>