Skip to content

Commit f5267bb

Browse files
committed
Fix url encoding error and update starting instructions
- Some file names have unicode characters. These file names cause problems when they are used to build urls. - We use ampersand to put the python process in the background, then the process dies when the ssh connection is broken. Therefore, instead, we use setsid so that the process is not tied to the ssh session.
1 parent ab29926 commit f5267bb

File tree

2 files changed

+7
-3
lines changed

2 files changed

+7
-3
lines changed

Diff for: corrected_change_crawler.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def find_interesting_file_paths(self, change_id):
4343
paths = []
4444
for f in interesting_file_paths:
4545
path = self.crawler.build_diff_path(change_id, last_revision_no, f)
46-
print path
46+
# print path
4747
paths.append(path)
4848

4949
self.logger.info("Successfully done with %s" % change_id)
@@ -68,6 +68,9 @@ def crawl(self, start_change_id, end_change_id):
6868
except urllib2.URLError as e:
6969
self.logger.error("We failed to reach a server for %s. Reason: %s" % (change_id, e.reason))
7070
self.crawler.insert_status(change_id, "ERROR", e.reason)
71+
except UnicodeEncodeError as e:
72+
self.logger.error("We failed to process a filepath with the unicode format for %s. Reason: %s" % (change_id, e.reason))
73+
self.crawler.insert_status(change_id, "ERROR", e.reason)
7174
else:
7275
self.crawler.insert_status(change_id, "DONE", len(paths))
7376
for path in paths:

Diff for: instructions.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
python corrected_change_crawler_runner.py --start 300406 --end 300410 --db db/db.sqlite3 --service https://gerrit.wikimedia.org/r
2-
1+
```
2+
setsid python corrected_change_crawler_runner.py --start 300406 --end 300410 --db db/db.sqlite3 --service https://gerrit.wikimedia.org/r
3+
```

0 commit comments

Comments
 (0)