this is an app where rohan's students can submit their ongoing projects and get tailored feedback from an LLM according to his rubric. it uses the github API and Claude opus to crawl and evaluate repository content (markdown files, code, repo structure, etc) and provide detailed feedback.
i used Claude for a lot of the debugging in this project and also for navigating previously uncharted waters of javascript/flask/render
- Anthropic account for Claude API access
- Render account for deployment
- Clone the repository:
git clone https://github.com/michaeladrouillard/grader.git
cd grader- Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # on Windows, use: venv\Scripts\activate
pip install -r requirements.txt-
GitHub API Key:
- GitHub Settings > Developer settings > Personal access tokens > Tokens (classic)
- Generate new token with
reposcope
-
Anthropic API Key:
- Go to Console
- Generate an API key
- Create a
config.pyfile in thesrcdirectory:
GITHUB_API_TOKEN = 'your_github_token_here'
ANTHROPIC_API_KEY = 'your_anthropic_key_here'You can also use os so you don't have to hardcode it or whatever here. But just make sure you .gitignore this file
- Test locally:
python src/repo_grader.py https://github.com/username/repositoryThatYouWantToEvaluateUsingThisApp-
Create a new Web Service on Render:
- Go to Render Dashboard
- Click "New +" and select "Web Service"
- Connect your GitHub repository
-
Configure the Web Service:
- Name: Choose a name (e.g., "paper-grader")
- Environment: Python 3
- Build Command:
pip install -r requirements.txt - Start Command:
python app.py - Add Environment Variables:
GITHUB_API_TOKEN: Your GitHub tokenANTHROPIC_API_KEY: Your Anthropic API key
-
Deploy your service:
- Click "Create Web Service"
- Wait for deployment to complete
- Update the API endpoint in
docs/js/grader.js:
const API_URL = 'https://your-render-service-name.onrender.com/api/grade';- Deploy the frontend to GitHub Pages:
- Go to repository Settings > Pages
- Set source to GitHub Actions
- Commit and push your changes
- Wait for the GitHub Action to complete
The grader should now be accessible at https://yourusername.github.io/grader!
The app uses Claude 3 Opus, and when I tested it on repos students submitted for the election forecasting assignment it came out to about $0.70 per run. YMMV based on the repo/prompt/token count being ingested by the model, and you can track cost stuff on the Anthropic Console.
Use an LLM for all of this it will go much faster lol.
-
Modify rubric.json
- Locate
src/data/rubric.json - Follow this structure for each rubric item:
{ "rubric_items": [ { "title": "Item Name", "range": { "min": 0, "max": 10 // Maximum possible points }, "values": { "0": "Poor or not done", "2": "Some issues", "4": "Acceptable", "6": "Exceeds expectations", "8": "Exceptional" }, "criteria": "Detailed description of what to look for when grading this item", "critical": false // Set to true if failing this item should result in zero overall } ] } - Locate
-
Important Notes About Rubric Structure:
- Each item MUST have a UNIQUE
title - The
valuesobject must include all possible scores - Scores must be within the
range.minandrange.max criticalitems should use a max score of 1 (pass/fail). This is for items where if the student doesnt do it, they fail the assignment entirely.
- Each item MUST have a UNIQUE
- Modify
maxGradesindocs/js/grader.js:
const maxGrades = {
'Your Item Name': maximum_points,
// Add all your rubric items here
};- Update item categories:
const itemCategories = {
critical: [
// Your critical items
],
documentation: [
// Your documentation items
],
// Add other categories as needed
};Modify category assignments in src/repo_grader.py:
# In batch_grade_rubric method
doc_items = [item for item in self.rubric
if not item.get('critical', False) and
item['title'].lower() in ['your', 'document', 'items']]
tech_items = [item for item in self.rubric
if not item.get('critical', False) and
item['title'].lower() in ['your', 'technical', 'items']]The backend works this way to drastically reduce the amount of tokens/LLM calls needed to grade a repo by grouping rubric items into batches based on which docs the LLM should look at. So, the tech items will look at .py, .ipynb, etc files and so on. Depending on your rubric, you may have to play around with this, but this is basically the main juncture where you can reduce costs/latency.
Once you set this up, test locally.
- Inconsistent item names between rubric and code
- Forgetting to update both frontend and backend
- Not properly setting up the backend processing (i.e., during a test run the LLM wasn't grading a rubric item that evaluated whether the student put their name and date. this information would have been found in the docs (.qmd, .pdf, etc.) but instead this item ended up in a batch that was fed repo structure and metadata)
Feel free to submit issues and pull requests for any improvements :-) I only tested this for the election forecasting assignment, so there may be hiccups for other kinds of rubrics.