Skip to content

DRAFT: Add project ingestion to the DB to the project sync cron job#98

Draft
alasdairwilson wants to merge 4 commits into
mainfrom
project-ingestion
Draft

DRAFT: Add project ingestion to the DB to the project sync cron job#98
alasdairwilson wants to merge 4 commits into
mainfrom
project-ingestion

Conversation

@alasdairwilson
Copy link
Copy Markdown
Member

Adds syncing and ingestion for external project repos into the existing auth/access DB, so deployed static projects get registered automatically and linked to real users.

One behaviour that i am not sure about... at the moment if a new project is inserted and the owner does not exist in the Users table then we throw a RuntimeError so then you would have to make the account and then wait for the hourly resync I guess?

Maybe that is fine...

@alasdairwilson alasdairwilson marked this pull request as draft May 28, 2026 15:50
@alasdairwilson
Copy link
Copy Markdown
Member Author

I have marked this as a draft because I havent really let this change perculate in my head very long so I don't know if it is fully "right" but I am on AL so I figured if I atleast made the PR people could have a look.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 73.05699% with 52 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
vertex/project_ingestion.py 73.05% 52 Missing ⚠️
Files with missing lines Coverage Δ
vertex/project_ingestion.py 73.05% <73.05%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sr-murthy sr-murthy changed the title Add project ingestion to the DB to the project sync cron job DRAFT: Add project ingestion to the DB to the project sync cron job May 29, 2026
Copy link
Copy Markdown
Member

@sr-murthy sr-murthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, but I had a few questions below.

git clone --branch "${PROJECTS_REPO_BRANCH}" "${PROJECTS_REPO_URL}" "${PROJECTS_DIR}"
fi
end_ts="$(date -u --iso-8601=seconds)"
echo "[${end_ts}] Sync finished successfully"
Copy link
Copy Markdown
Member

@sr-murthy sr-murthy May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful for debugging to include the time diff end_ts - start_ts in the echo message to get an idea of sync times, so something like:

time_diff=$(echo "$(date --date=$end_ts +%s) - $(date --date=$start_ts +%s)" |  bc -l)
echo "[${end_ts}] Sync finished successfully in $time_diff seconds"

Had to install GNU coreutils on Mac to test the time diff with gdate (not date which errors), but this should work.

@@ -0,0 +1,353 @@
import argparse
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a VERTEX CLI. I'm wondering whether it would be worth adding this functionality, for example, in an admin command subgroup.

Obviously, the CLI is user-facing, but this particular script of yours requires certain credentials and information only admins would have, so I don't see a problem with it.

I think this looks fine in its current form, but it would be nice to have a uniform common CLI that can be used by users, developers and admins.

Comment thread README.md
Comment thread README.md
@sr-murthy sr-murthy force-pushed the project-ingestion branch from 8a678b3 to d0fce03 Compare June 1, 2026 09:26
@read-the-docs-community
Copy link
Copy Markdown

Documentation build overview

📚 ISARIC VERTEX | 🛠️ Build #32932662 | 📁 Comparing d0fce03 against latest (8e78bf1)

  🔍 Preview build  

7 files changed · + 5 added · ± 2 modified

+ Added

± Modified

@lithomson
Copy link
Copy Markdown
Collaborator

The code looks good to me. I tested it using my local database and test projects for ISARICAccount. Nothing loaded when one of the emails didn't exist. Should this one just be skipped?

% python -m vertex.project_ingestion
--projects-dir /Users/laurat/PycharmProjects/IsaricAccount/dev/vertex
--database-url sqlite:////Users/laurat/PycharmProjects/IsaricAccount/instance/dev.db

2026-06-01 11:34:33 [ERROR] main: Static project ingestion failed: Cannot insert project: owner user does not exist for no_a_user@gmail.com. Parsed config={"is_public": true, "name": "project_3", "project_dir": "/Users/laurat/PycharmProjects/IsaricAccount/dev/vertex/project_3", "project_id": "this-wont-be-loaded", "project_owner": "no_a_user@gmail.com"}

I changed to an existing owner and they all loaded (as expected).

2026-06-01 11:35:42 [INFO] main: Static project ingestion summary: seen=5 inserted=5 existing=0 owner_links_inserted=5 owner_links_existing=0 owner_pending_users=0 owner_immutable_skipped=0

And then no reload attempt for the third run (as expected).

2026-06-01 11:38:20 [INFO] main: Static project ingestion summary: seen=5 inserted=0 existing=5 owner_links_inserted=0 owner_links_existing=0 owner_pending_users=0 owner_immutable_skipped=5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants