-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, the scraper data is keyed by course code (e.g. CSCI 1100). While this is fine in most cases, there are some courses offered at RPI whose details differ significantly depending on section.
A good example is ADMN 1030 - Arch Exploration & Planning. As of Fall 2025, there are 7 sections of the course, each one for a different major. Each of these sections has a different course title (e.g. Arch Exploration & Planning Architecture, Arch Exploration & Planning ITWS/Undecided) as well as different restrictions (e.g. Architecture, ITWS).
This creates an issue in the scraped data in which courses like ADMN 1030 only possess one course title and one set of restrictions/prerequisites/corequisites/etc, which is simply determined by the section of the course that was processed first in our SIS scraper.
For example, the output data may show that ADMN 1030 is just Arch Exploration & Planning Architecture and is restricted only to Architecture majors. However, this is very misleading and omits all of the other variants of the course.
Overall, I think that the current approach to the scraper isn't great and could be so much better. By properly scraping details for every section of each course, the scraper could offer much more accurate and granular data, which is especially important for general use.
A possible new general JSON structure discussed by Jack and I is shown below:
{
"CSCI": {
"subjectName": "Computer Science",
"courses": {
"1100": [
{
"courseReferenceNumber": "12345",
"title": "Introduction to Computer Science",
"description": "An introductory course to computer science concepts.",
"prerequisites": [],
"corequisites": [],
"crosslists": [],
"restrictions": {},
"creditMin": 4,
"creditMax": 4,
"instructors": [
"Dr. Smith",
"Prof. Johnson"
],
"seatsCapacity": 100,
"seatsRegistered": 95,
"seatsOpen": 5
}
]
}
}
}