You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Adds optional support for giving a complete URL as the ID field (#22)
With this feature, it is now supported to pass the complete URL as the `id` field and this application will do a best-effort attempt at inferring the ID from the URL.
This might not work for all URLs, as the ID can be stored in different ways. If the ID cannot be parsed from the URL, please fall back to manually parsing the ID field.
> ^^^^^^^^^^^^^^^^^^ This demarks the ID of this book
74
74
> ```
75
+
> Alternatively, you can provide the complete URL as the ID argument, and the tool will attempt to parse the ID from the URL. Note that this feature is best-effort, and for optimal stability, it is still recommended to provide the specific ID directly.
Copy file name to clipboardexpand all lines: hathitrustdownloader/cli.py
+23-5
Original file line number
Diff line number
Diff line change
@@ -3,17 +3,35 @@
3
3
importrequests
4
4
importtime
5
5
importargparse
6
+
fromurllib.parseimporturlparse, parse_qs
7
+
8
+
defextract_id_from_url(url):
9
+
"""
10
+
Extracts the ID parameter from a HathiTrust URL.
11
+
12
+
Args:
13
+
url (str): The complete URL containing the ID parameter.
14
+
15
+
Returns:
16
+
str: The extracted ID value or None if not found.
17
+
"""
18
+
parsed_url=urlparse(url)
19
+
query_params=parse_qs(parsed_url.query)
20
+
returnquery_params.get('id', [None])[0]
6
21
7
22
defmain():
8
23
parser=argparse.ArgumentParser(description='Book downloader for HathiTrust')
9
24
10
-
parser.add_argument('id', type=str, help="The ID of the book, e.g 'mdp.39015027794331'.")
25
+
parser.add_argument('id', type=str, help="The ID of the book, e.g 'mdp.39015027794331' or a complete URL.")
11
26
parser.add_argument('start_page', type=int, help="The page number of the first page to be downloaded.")
12
27
parser.add_argument('end_page', type=int, help="The last number of the last page to be downloaded (inclusive).")
13
28
parser.add_argument('--name', dest='name', type=str, help="The start of the filename. Defaults to using the id. This can also be used to change the path.")
0 commit comments