Skip to content

Commit 501e986

Browse files
committed
u
1 parent e52c45d commit 501e986

20 files changed

+47
-47
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<p align="center">
2-
<a href="https://morvanzhou.github.io/" target="_blank">
2+
<a href="https://mofanpy.com/" target="_blank">
33
<img width="40%" src="/scraping.jpg" style="max-width:100%;">
44
</a>
55
</p>
@@ -10,7 +10,7 @@
1010
# Web scraping tutorials (Python)
1111

1212
In these tutorials, we will learn to build some simple but useful scrapers from scratch. Get to know how we can read web page and select sections you need or even download files.
13-
If you understand Chinese, you are lucky! I made Chinese video + text tutorials for all of these contents. You can find it in [莫烦Python](https://morvanzhou.github.io/).
13+
If you understand Chinese, you are lucky! I made Chinese video + text tutorials for all of these contents. You can find it in [莫烦Python](https://mofanpy.com/).
1414

1515

1616
**Learning from code, I made two options for you.**
@@ -53,7 +53,7 @@ If you understand Chinese, you are lucky! I made Chinese video + text tutorials
5353

5454
<div>
5555
<a href="https://www.patreon.com/morvan">
56-
<img src="https://morvanzhou.github.io/static/img/support/patreon.jpg"
56+
<img src="https://mofanpy.com/static/img/support/patreon.jpg"
5757
alt="Patreon"
5858
height=120></a>
5959
</div>

notebook/1-1-urllib.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"metadata": {},
66
"source": [
77
"# Simplest scraping\n",
8-
"**We gonna show how to open a very simple [webpage](https://morvanzhou.github.io/static/scraping/basic-structure.html), and read all the content in it.**"
8+
"**We gonna show how to open a very simple [webpage](https://mofanpy.com/static/scraping/basic-structure.html), and read all the content in it.**"
99
]
1010
},
1111
{
@@ -19,15 +19,15 @@
1919
"name": "stdout",
2020
"output_type": "stream",
2121
"text": [
22-
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>Scraping tutorial 1 | 莫烦Python</title>\n\t<link rel=\"icon\" href=\"https://morvanzhou.github.io/static/img/description/tab_icon.png\">\n</head>\n<body>\n\t<h1>爬虫测试1</h1>\n\t<p>\n\t\t这是一个在 <a href=\"https://morvanzhou.github.io/\">莫烦Python</a>\n\t\t<a href=\"https://morvanzhou.github.io/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n\n</body>\n</html>\n"
22+
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>Scraping tutorial 1 | 莫烦Python</title>\n\t<link rel=\"icon\" href=\"https://mofanpy.com/static/img/description/tab_icon.png\">\n</head>\n<body>\n\t<h1>爬虫测试1</h1>\n\t<p>\n\t\t这是一个在 <a href=\"https://mofanpy.com/\">莫烦Python</a>\n\t\t<a href=\"https://mofanpy.com/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n\n</body>\n</html>\n"
2323
]
2424
}
2525
],
2626
"source": [
2727
"from urllib.request import urlopen\n",
2828
"\n",
2929
"# if has Chinese, apply decode()\n",
30-
"html = urlopen(\"https://morvanzhou.github.io/static/scraping/basic-structure.html\").read().decode('utf-8')\n",
30+
"html = urlopen(\"https://mofanpy.com/static/scraping/basic-structure.html\").read().decode('utf-8')\n",
3131
"print(html)\n"
3232
]
3333
},
@@ -73,7 +73,7 @@
7373
"name": "stdout",
7474
"output_type": "stream",
7575
"text": [
76-
"\nPage paragraph is: \n\t\t这是一个在 <a href=\"https://morvanzhou.github.io/\">莫烦Python</a>\n\t\t<a href=\"https://morvanzhou.github.io/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t\n"
76+
"\nPage paragraph is: \n\t\t这是一个在 <a href=\"https://mofanpy.com/\">莫烦Python</a>\n\t\t<a href=\"https://mofanpy.com/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t\n"
7777
]
7878
}
7979
],
@@ -98,7 +98,7 @@
9898
"name": "stdout",
9999
"output_type": "stream",
100100
"text": [
101-
"\nAll links: ['https://morvanzhou.github.io/static/img/description/tab_icon.png', 'https://morvanzhou.github.io/', 'https://morvanzhou.github.io/tutorials/scraping']\n"
101+
"\nAll links: ['https://mofanpy.com/static/img/description/tab_icon.png', 'https://mofanpy.com/', 'https://mofanpy.com/tutorials/scraping']\n"
102102
]
103103
}
104104
],

notebook/2-1-beautifulsoup-basic.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"metadata": {},
66
"source": [
77
"# Basic usage of beautifulsoup\n",
8-
"**First we still need to open a [page](https://morvanzhou.github.io/static/scraping/basic-structure.html), then we can apply beautifulsoup on this page's html.**"
8+
"**First we still need to open a [page](https://mofanpy.com/static/scraping/basic-structure.html), then we can apply beautifulsoup on this page's html.**"
99
]
1010
},
1111
{
@@ -19,7 +19,7 @@
1919
"name": "stdout",
2020
"output_type": "stream",
2121
"text": [
22-
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>Scraping tutorial 1 | 莫烦Python</title>\n\t<link rel=\"icon\" href=\"https://morvanzhou.github.io/static/img/description/tab_icon.png\">\n</head>\n<body>\n\t<h1>爬虫测试1</h1>\n\t<p>\n\t\t这是一个在 <a href=\"https://morvanzhou.github.io/\">莫烦Python</a>\n\t\t<a href=\"https://morvanzhou.github.io/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n\n</body>\n</html>\n"
22+
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>Scraping tutorial 1 | 莫烦Python</title>\n\t<link rel=\"icon\" href=\"https://mofanpy.com/static/img/description/tab_icon.png\">\n</head>\n<body>\n\t<h1>爬虫测试1</h1>\n\t<p>\n\t\t这是一个在 <a href=\"https://mofanpy.com/\">莫烦Python</a>\n\t\t<a href=\"https://mofanpy.com/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n\n</body>\n</html>\n"
2323
]
2424
}
2525
],
@@ -28,7 +28,7 @@
2828
"from urllib.request import urlopen\n",
2929
"\n",
3030
"# if has Chinese, apply decode()\n",
31-
"html = urlopen(\"https://morvanzhou.github.io/static/scraping/basic-structure.html\").read().decode('utf-8')\n",
31+
"html = urlopen(\"https://mofanpy.com/static/scraping/basic-structure.html\").read().decode('utf-8')\n",
3232
"print(html)"
3333
]
3434
},
@@ -48,7 +48,7 @@
4848
"name": "stdout",
4949
"output_type": "stream",
5050
"text": [
51-
"<h1>爬虫测试1</h1>\n\n <p>\n\t\t这是一个在 <a href=\"https://morvanzhou.github.io/\">莫烦Python</a>\n<a href=\"https://morvanzhou.github.io/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n"
51+
"<h1>爬虫测试1</h1>\n\n <p>\n\t\t这是一个在 <a href=\"https://mofanpy.com/\">莫烦Python</a>\n<a href=\"https://mofanpy.com/tutorials/scraping\">爬虫教程</a> 中的简单测试.\n\t</p>\n"
5252
]
5353
}
5454
],
@@ -74,7 +74,7 @@
7474
"name": "stdout",
7575
"output_type": "stream",
7676
"text": [
77-
"\n ['https://morvanzhou.github.io/', 'https://morvanzhou.github.io/tutorials/scraping']\n"
77+
"\n ['https://mofanpy.com/', 'https://mofanpy.com/tutorials/scraping']\n"
7878
]
7979
}
8080
],

notebook/2-2-beautifulsoup-css.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"metadata": {},
66
"source": [
77
"# Beautifulsoup: find by CSS class \n",
8-
"**First we still need to open a [page](https://morvanzhou.github.io/static/scraping/list.html), then we can apply beautifulsoup on this page's html.**"
8+
"**First we still need to open a [page](https://mofanpy.com/static/scraping/list.html), then we can apply beautifulsoup on this page's html.**"
99
]
1010
},
1111
{
@@ -19,7 +19,7 @@
1919
"name": "stdout",
2020
"output_type": "stream",
2121
"text": [
22-
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>爬虫练习 列表 class | 莫烦 Python</title>\n\t<style>\n\t.jan {\n\t\tbackground-color: yellow;\n\t}\n\t.feb {\n\t\tfont-size: 25px;\n\t}\n\t.month {\n\t\tcolor: red;\n\t}\n\t</style>\n</head>\n\n<body>\n\n<h1>列表 爬虫练习</h1>\n\n<p>这是一个在 <a href=\"https://morvanzhou.github.io/\" >莫烦 Python</a> 的 <a href=\"https://morvanzhou.github.io/tutorials/scraping\" >爬虫教程</a>\n\t里无敌简单的网页, 所有的 code 让你一目了然, 清晰无比.</p>\n\n<ul>\n\t<li class=\"month\">一月</li>\n\t<ul class=\"jan\">\n\t\t<li>一月一号</li>\n\t\t<li>一月二号</li>\n\t\t<li>一月三号</li>\n\t</ul>\n\t<li class=\"feb month\">二月</li>\n\t<li class=\"month\">三月</li>\n\t<li class=\"month\">四月</li>\n\t<li class=\"month\">五月</li>\n</ul>\n\n</body>\n</html>\n"
22+
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>爬虫练习 列表 class | 莫烦 Python</title>\n\t<style>\n\t.jan {\n\t\tbackground-color: yellow;\n\t}\n\t.feb {\n\t\tfont-size: 25px;\n\t}\n\t.month {\n\t\tcolor: red;\n\t}\n\t</style>\n</head>\n\n<body>\n\n<h1>列表 爬虫练习</h1>\n\n<p>这是一个在 <a href=\"https://mofanpy.com/\" >莫烦 Python</a> 的 <a href=\"https://mofanpy.com/tutorials/scraping\" >爬虫教程</a>\n\t里无敌简单的网页, 所有的 code 让你一目了然, 清晰无比.</p>\n\n<ul>\n\t<li class=\"month\">一月</li>\n\t<ul class=\"jan\">\n\t\t<li>一月一号</li>\n\t\t<li>一月二号</li>\n\t\t<li>一月三号</li>\n\t</ul>\n\t<li class=\"feb month\">二月</li>\n\t<li class=\"month\">三月</li>\n\t<li class=\"month\">四月</li>\n\t<li class=\"month\">五月</li>\n</ul>\n\n</body>\n</html>\n"
2323
]
2424
}
2525
],
@@ -28,7 +28,7 @@
2828
"from urllib.request import urlopen\n",
2929
"\n",
3030
"# if has Chinese, apply decode()\n",
31-
"html = urlopen(\"https://morvanzhou.github.io/static/scraping/list.html\").read().decode('utf-8')\n",
31+
"html = urlopen(\"https://mofanpy.com/static/scraping/list.html\").read().decode('utf-8')\n",
3232
"print(html)"
3333
]
3434
},

notebook/2-3-beautifulsoup-regex.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
"metadata": {},
66
"source": [
77
"# Beautifulsoup: find by regual expression\n",
8-
"**First we import re for regex. Then, open a [page](https://morvanzhou.github.io/static/scraping/table.html), then we can apply beautifulsoup on this page's html.**"
8+
"**First we import re for regex. Then, open a [page](https://mofanpy.com/static/scraping/table.html), then we can apply beautifulsoup on this page's html.**"
99
]
1010
},
1111
{
@@ -19,7 +19,7 @@
1919
"name": "stdout",
2020
"output_type": "stream",
2121
"text": [
22-
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>爬虫练习 表格 table | 莫烦 Python</title>\n\n\t<style>\n\timg {\n\t\twidth: 250px;\n\t}\n\ttable{\n\t\twidth:50%;\n\t}\n\ttd{\n\t\tmargin:10px;\n\t\tpadding:15px;\n\t}\n\t</style>\n</head>\n<body>\n\n<h1>表格 爬虫练习</h1>\n\n<p>这是一个在 <a href=\"https://morvanzhou.github.io/\" >莫烦 Python</a> 的 <a href=\"https://morvanzhou.github.io/tutorials/scraping\" >爬虫教程</a>\n\t里无敌简单的网页, 所有的 code 让你一目了然, 清晰无比.</p>\n\n<br>\n<table id=\"course-list\">\n\t<tr>\n\t\t<th>\n\t\t\t分类\n\t\t</th><th>\n\t\t\t名字\n\t\t</th><th>\n\t\t\t时长\n\t\t</th><th>\n\t\t\t预览\n\t\t</th>\n\t</tr>\n\n\t<tr id=\"course1\" class=\"ml\">\n\t\t<td>\n\t\t\t机器学习\n\t\t</td><td>\n\t\t\t<a href=\"https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/\">\n\t\t\t\tTensorflow 神经网络</a>\n\t\t</td><td>\n\t\t\t2:00\n\t\t</td><td>\n\t\t\t<img src=\"https://morvanzhou.github.io/static/img/course_cover/tf.jpg\">\n\t\t</td>\n\t</tr>\n\n\t<tr id=\"course2\" class=\"ml\">\n\t\t<td>\n\t\t\t机器学习\n\t\t</td><td>\n\t\t\t<a href=\"https://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/\">\n\t\t\t\t强化学习</a>\n\t\t</td><td>\n\t\t\t5:00\n\t\t</td><td>\n\t\t\t<img src=\"https://morvanzhou.github.io/static/img/course_cover/rl.jpg\">\n\t\t</td>\n\t</tr>\n\n\t<tr id=\"course3\" class=\"data\">\n\t\t<td>\n\t\t\t数据处理\n\t\t</td><td>\n\t\t\t<a href=\"https://morvanzhou.github.io/tutorials/data-manipulation/scraping/\">\n\t\t\t\t爬虫</a>\n\t\t</td><td>\n\t\t\t3:00\n\t\t</td><td>\n\t\t\t<img src=\"https://morvanzhou.github.io/static/img/course_cover/scraping.jpg\">\n\t\t</td>\n\t</tr>\n\n</table>\n\n</body>\n</html>\n"
22+
"<!DOCTYPE html>\n<html lang=\"cn\">\n<head>\n\t<meta charset=\"UTF-8\">\n\t<title>爬虫练习 表格 table | 莫烦 Python</title>\n\n\t<style>\n\timg {\n\t\twidth: 250px;\n\t}\n\ttable{\n\t\twidth:50%;\n\t}\n\ttd{\n\t\tmargin:10px;\n\t\tpadding:15px;\n\t}\n\t</style>\n</head>\n<body>\n\n<h1>表格 爬虫练习</h1>\n\n<p>这是一个在 <a href=\"https://mofanpy.com/\" >莫烦 Python</a> 的 <a href=\"https://mofanpy.com/tutorials/scraping\" >爬虫教程</a>\n\t里无敌简单的网页, 所有的 code 让你一目了然, 清晰无比.</p>\n\n<br>\n<table id=\"course-list\">\n\t<tr>\n\t\t<th>\n\t\t\t分类\n\t\t</th><th>\n\t\t\t名字\n\t\t</th><th>\n\t\t\t时长\n\t\t</th><th>\n\t\t\t预览\n\t\t</th>\n\t</tr>\n\n\t<tr id=\"course1\" class=\"ml\">\n\t\t<td>\n\t\t\t机器学习\n\t\t</td><td>\n\t\t\t<a href=\"https://mofanpy.com/tutorials/machine-learning/tensorflow/\">\n\t\t\t\tTensorflow 神经网络</a>\n\t\t</td><td>\n\t\t\t2:00\n\t\t</td><td>\n\t\t\t<img src=\"https://mofanpy.com/static/img/course_cover/tf.jpg\">\n\t\t</td>\n\t</tr>\n\n\t<tr id=\"course2\" class=\"ml\">\n\t\t<td>\n\t\t\t机器学习\n\t\t</td><td>\n\t\t\t<a href=\"https://mofanpy.com/tutorials/machine-learning/reinforcement-learning/\">\n\t\t\t\t强化学习</a>\n\t\t</td><td>\n\t\t\t5:00\n\t\t</td><td>\n\t\t\t<img src=\"https://mofanpy.com/static/img/course_cover/rl.jpg\">\n\t\t</td>\n\t</tr>\n\n\t<tr id=\"course3\" class=\"data\">\n\t\t<td>\n\t\t\t数据处理\n\t\t</td><td>\n\t\t\t<a href=\"https://mofanpy.com/tutorials/data-manipulation/scraping/\">\n\t\t\t\t爬虫</a>\n\t\t</td><td>\n\t\t\t3:00\n\t\t</td><td>\n\t\t\t<img src=\"https://mofanpy.com/static/img/course_cover/scraping.jpg\">\n\t\t</td>\n\t</tr>\n\n</table>\n\n</body>\n</html>\n"
2323
]
2424
}
2525
],
@@ -29,7 +29,7 @@
2929
"import re\n",
3030
"\n",
3131
"# if has Chinese, apply decode()\n",
32-
"html = urlopen(\"https://morvanzhou.github.io/static/scraping/table.html\").read().decode('utf-8')\n",
32+
"html = urlopen(\"https://mofanpy.com/static/scraping/table.html\").read().decode('utf-8')\n",
3333
"print(html)"
3434
]
3535
},
@@ -51,7 +51,7 @@
5151
"name": "stdout",
5252
"output_type": "stream",
5353
"text": [
54-
"https://morvanzhou.github.io/static/img/course_cover/tf.jpg\nhttps://morvanzhou.github.io/static/img/course_cover/rl.jpg\nhttps://morvanzhou.github.io/static/img/course_cover/scraping.jpg\n"
54+
"https://mofanpy.com/static/img/course_cover/tf.jpg\nhttps://mofanpy.com/static/img/course_cover/rl.jpg\nhttps://mofanpy.com/static/img/course_cover/scraping.jpg\n"
5555
]
5656
}
5757
],
@@ -79,7 +79,7 @@
7979
"name": "stdout",
8080
"output_type": "stream",
8181
"text": [
82-
"https://morvanzhou.github.io/\nhttps://morvanzhou.github.io/tutorials/scraping\nhttps://morvanzhou.github.io/tutorials/machine-learning/tensorflow/\nhttps://morvanzhou.github.io/tutorials/machine-learning/reinforcement-learning/\nhttps://morvanzhou.github.io/tutorials/data-manipulation/scraping/\n"
82+
"https://mofanpy.com/\nhttps://mofanpy.com/tutorials/scraping\nhttps://mofanpy.com/tutorials/machine-learning/tensorflow/\nhttps://mofanpy.com/tutorials/machine-learning/reinforcement-learning/\nhttps://mofanpy.com/tutorials/data-manipulation/scraping/\n"
8383
]
8484
}
8585
],

notebook/3-2-download.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"import os\n",
2121
"os.makedirs('./img/', exist_ok=True)\n",
2222
"\n",
23-
"IMAGE_URL = \"https://morvanzhou.github.io/static/img/description/learning_step_flowchart.png\""
23+
"IMAGE_URL = \"https://mofanpy.com/static/img/description/learning_step_flowchart.png\""
2424
]
2525
},
2626
{

notebook/4-1-distributed-scraping.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"source": [
77
"# Distributed scraping: multiprocessing\n",
88
"\n",
9-
"**Speed up scraping by distributed crawling and parsing. I'm going to scrape [my website](https://morvanzhou.github.io/) but in a local server \"http://127.0.0.1:4000/\" to eliminate different downloading speed. This test is more accurate in time measuring. You can use \"https://morvanzhou.github.io/\" instead, because you cannot access \"http://127.0.0.1:4000/\".**\n",
9+
"**Speed up scraping by distributed crawling and parsing. I'm going to scrape [my website](https://mofanpy.com/) but in a local server \"http://127.0.0.1:4000/\" to eliminate different downloading speed. This test is more accurate in time measuring. You can use \"https://mofanpy.com/\" instead, because you cannot access \"http://127.0.0.1:4000/\".**\n",
1010
"\n",
1111
"**We gonna scrape all web pages in my website and reture the title and url for each page.**"
1212
]
@@ -26,7 +26,7 @@
2626
"import re\n",
2727
"\n",
2828
"base_url = \"http://127.0.0.1:4000/\"\n",
29-
"# base_url = 'https://morvanzhou.github.io/'\n",
29+
"# base_url = 'https://mofanpy.com/'\n",
3030
"\n",
3131
"# DON'T OVER CRAWL THE WEBSITE OR YOU MAY NEVER VISIT AGAIN\n",
3232
"if base_url != \"http://127.0.0.1:4000/\":\n",

notebook/4-2-asyncio.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,14 @@
129129
"name": "stdout",
130130
"output_type": "stream",
131131
"text": [
132-
"https://morvanzhou.github.io/\nhttps://morvanzhou.github.io/\nNormal total time: 0.3869960308074951\n"
132+
"https://mofanpy.com/\nhttps://mofanpy.com/\nNormal total time: 0.3869960308074951\n"
133133
]
134134
}
135135
],
136136
"source": [
137137
"import requests\n",
138138
"\n",
139-
"URL = 'https://morvanzhou.github.io/'\n",
139+
"URL = 'https://mofanpy.com/'\n",
140140
"\n",
141141
"\n",
142142
"def normal(): \n",
@@ -167,7 +167,7 @@
167167
"name": "stdout",
168168
"output_type": "stream",
169169
"text": [
170-
"['https://morvanzhou.github.io/', 'https://morvanzhou.github.io/']\nAsync total time: 0.11447715759277344\n"
170+
"['https://mofanpy.com/', 'https://mofanpy.com/']\nAsync total time: 0.11447715759277344\n"
171171
]
172172
}
173173
],
@@ -252,7 +252,7 @@
252252
"import re\n",
253253
"import multiprocessing as mp\n",
254254
"\n",
255-
"# base_url = \"https://morvanzhou.github.io/\"\n",
255+
"# base_url = \"https://mofanpy.com/\"\n",
256256
"base_url = \"http://127.0.0.1:4000/\"\n",
257257
"\n",
258258
"# DON'T OVER CRAWL THE WEBSITE OR YOU MAY NEVER VISIT AGAIN\n",
@@ -384,7 +384,7 @@
384384
"\n",
385385
"\n",
386386
"if __name__ == '__main__':\n",
387-
" # base_url = 'https://morvanzhou.github.io/'\n",
387+
" # base_url = 'https://mofanpy.com/'\n",
388388
" base_url = \"http://127.0.0.1:4000/\"\n",
389389
" \n",
390390
" # DON'T OVER CRAWL THE WEBSITE OR YOU MAY NEVER VISIT AGAIN\n",

notebook/5-1-selenium.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
"from selenium import webdriver\n",
4040
"\n",
4141
"driver = webdriver.Chrome()\n",
42-
"driver.get(\"https://morvanzhou.github.io/\")\n",
42+
"driver.get(\"https://mofanpy.com/\")\n",
4343
"driver.find_element_by_xpath(u\"//img[@alt='强化学习 (Reinforcement Learning)']\").click()\n",
4444
"driver.find_element_by_link_text(\"About\").click()\n",
4545
"driver.find_element_by_link_text(u\"赞助\").click()\n",
@@ -82,7 +82,7 @@
8282
"\n",
8383
"# add the option when creating driver\n",
8484
"driver = webdriver.Chrome(chrome_options=chrome_options) \n",
85-
"driver.get(\"https://morvanzhou.github.io/\")\n",
85+
"driver.get(\"https://mofanpy.com/\")\n",
8686
"driver.find_element_by_xpath(u\"//img[@alt='强化学习 (Reinforcement Learning)']\").click()\n",
8787
"driver.find_element_by_link_text(\"About\").click()\n",
8888
"driver.find_element_by_link_text(u\"赞助\").click()\n",

notebook/5-2-scrapy.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"**[Scrapy](https://scrapy.org/) is a very good web scrape project. If you have a large project, this one may help you a lot.**\n",
1010
"\n",
11-
"**We demonstrate it by an example showed last time, to scrape [my website](https://morvanzhou.github.io/').This is a simple demo, if you want to dig into it, here is the [official tutorials](https://docs.scrapy.org/en/latest/intro/overview.html).**"
11+
"**We demonstrate it by an example showed last time, to scrape [my website](https://mofanpy.com/').This is a simple demo, if you want to dig into it, here is the [official tutorials](https://docs.scrapy.org/en/latest/intro/overview.html).**"
1212
]
1313
},
1414
{
@@ -23,7 +23,7 @@
2323
"class MofanSpider(scrapy.Spider):\n",
2424
" name = \"mofan\"\n",
2525
" start_urls = [\n",
26-
" 'https://morvanzhou.github.io/',\n",
26+
" 'https://mofanpy.com/',\n",
2727
" ]\n",
2828
" # unseen = set()\n",
2929
" # seen = set() # we don't need these two as scrapy will deal with them automatically\n",

0 commit comments

Comments
 (0)