-
Notifications
You must be signed in to change notification settings - Fork 0
Importing repository materials
Much like the corpus texts, repository materials include a file with metadata in header tags as well as the plaintext of the repository material; they additionally include the original repository resource, in .pdf format. The filename of the .pdf must match the filename of the associated .txt file so that the importer can find the .pdf and import it.
Periodically new assignment types, topic types, and college codes will be added to the list of these categories that Crow curates. Before importing new repository materials, check for any changes to these maps at https://drive.google.com/drive/folders/1B0giVT429xQdiY35M__om1lTo-5WuB2t
If there are changes, those can be manually copied over to the mapping that is in code at profiles/corpus/modules/corpus_importer/src/ImporterMap.php
Currently we used a shared cloud file hosting service to distribute the prepared repository materials. Access to the files is currently provided by Shelley Staples. The files can be downloaded and placed in a directory anywhere in the codebase root.
If all repository materials need to be reimported (for example, if existing files' data has been updated or a change has been made to the schema), run the following command:
lando drush repository-wipe
The same command for importing corpus materials is used for importing repository materials; the code inspects the header file and determines whether it is a repository or corpus text based on the presence of the File ID
or Student ID
header.
lando drush corpus-import path/to/directory
You should see output in the terminal similar to the following:
106_RR_AS_1299_UA
Importing original file New Repository files 3_13_20/filenames/ENGL106/Fall 2019/1025/Language_Awareness/106_RR_RU_1300_UA.pdf
106_RR_RU_1300_UA
Importing original file New Repository files 3_13_20/filenames/ENGL106/Fall 2018/1024/NA/106_RR_AS_1288_UA.pdf
106_RR_AS_1288_UA
Importing original file New Repository files 3_13_20/filenames/ENGL106/Fall 2018/1024/Peer_Review/106_RR_AC_1287_UA.pdf
...
If the importer was unable to find any of the .pdfs, a message will be printed as follows:
Go to /admin/config/search/search-api/index/resource_index
or run lando drush sapi-i
- Spin up the frontend locally:
cd ~/Sites/crow_frontend && ng serve
, - Interact with the frontend & verify the corpus numbers & metadata numbers look correct.
Follow the steps outlined at https://github.com/writecrow/crow_backend/wiki/Deploying-to-the-server