-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathextract_names_pseudocode.txt
More file actions
27 lines (20 loc) · 1.33 KB
/
extract_names_pseudocode.txt
File metadata and controls
27 lines (20 loc) · 1.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
task: retrieve musician names from one of many possible metadata fields and write to csv along with discography info
1. open csv which contains links
2. loop through rows and open each link
3. for each link, extract musician name from one of following metadata fields
1. 'Other Author(s):'
2. 'Participant/Performer:' e.g. <span class="fieldLabelSpan">Participant/Performer:</span><span class="subfieldData Location" id="Participant/Performer:">American Jazz Quintet (Ellis Marsalis, Jr., piano ; Harold Battiste, tenor sax ; Alvin Batiste, clarinet ; Ed Blackwell, drums ; Richard Payne and William Swanson, bass).<br>
</span>
Solution: for span in soup.find_all('span', id='Participant/Performer:'):
if 'tiste' in span.text:
print(span.text)
3. 'Contents:'
4. 'Author:' e.g. <span class="subfieldData Location" id="Author:"><a href="search?searchArg=Batiste, Alvin.&searchCode=NAME&searchType=4">Batiste, Alvin.</a>
<br>
</span>
Solution: for span in soup.find_all('span', id='Author:'):
if 'tiste' in span.text:
print(span.text)
5. 'Title'
4. for each link, extract metadata field 'Title:'
5. write musician name and recording title to each row of output csv