Skip to content

Conversation

@nbowen3
Copy link

@nbowen3 nbowen3 commented Dec 9, 2025

Relevant issue or PR

Description of changes

If auth is present in MLFlow, we need to pass the creds in via environment variables (https://mlflow.org/docs/latest/self-hosting/security/basic-http-auth/#using-environment-variables). While this works for when we use the mlflow package, the reachability check just uses requests. This adds the username/password to the requests.

Note: I thought about using the /health endpoint of mlflow instead (which is un-authed on mlflow) but it is (a) undocumented so might change and (b) doesn't replicate actual usage as well.
We also don't use the traditional tesseract config package (prepand config with TESSERACT) for the same reason (want to replicate the actual process the mlflow package will use when it makes requests).

Testing done

Local testing

@PasteurBot
Copy link
Contributor

PasteurBot commented Dec 9, 2025

CLA signatures confirmed

All contributors have signed the Contributor License Agreement.
Posted by the CLA Assistant Lite bot.

Copy link
Contributor

@xalelax xalelax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle LGTM, but

  • fix linting (you can use pre-commit for it)
  • the comments here are superfluous, I'd delete them

@pasteurlabs pasteurlabs deleted a comment from codecov bot Dec 9, 2025
@nbowen3
Copy link
Author

nbowen3 commented Dec 9, 2025

@PasteurBot I have read the CLA Document and I hereby sign the CLA

@nbowen3 nbowen3 requested a review from xalelax December 9, 2025 17:02
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.80%. Comparing base (2147b14) to head (6440621).

Files with missing lines Patch % Lines
tesseract_core/runtime/mpa.py 69.23% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #416      +/-   ##
==========================================
+ Coverage   76.69%   76.80%   +0.10%     
==========================================
  Files          29       29              
  Lines        3420     3427       +7     
  Branches      533      533              
==========================================
+ Hits         2623     2632       +9     
- Misses        565      566       +1     
+ Partials      232      229       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xalelax
Copy link
Contributor

xalelax commented Dec 9, 2025

I approved, but maybe let's also document this somewhere in the docs if you find a nice spot 🙂

@nbowen3 nbowen3 enabled auto-merge (squash) December 9, 2025 17:11
Comment on lines 166 to 167
username = os.environ.get("MLFLOW_TRACKING_USERNAME")
password = os.environ.get("MLFLOW_TRACKING_PASSWORD")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we integrate these envars with the config, cc @xalelax?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sakoush The mlflow package expects those specific env variables so we have to set those no matter what. We could add these to config but then either need to set them twice or take the config version and apply it to these variables somewhere within the Tesseract. This felt cleanest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the reason of why this was done that @nbowen3 wrote in the description of the pr (and here) is satisfying, not all env variables need to be there, only stuff super associated with Tesseracts (like "mount these volumes", "use a gpu", and so on). These ones are associated with mlflow, and with the exact same name that one would use with its client, so it seems to me like the best choice.

Maybe @dionhaefner can also chime in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, to add to my comment above, maybe we could also consider renaming TESSERACT_MLFLOW_TRACKING_URI to just MLFLOW_TRACKING_URI, as the former might be a bit confusing)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have MLFLOW_TRACKING_URI but it interfered with people rolling their own MLFlow solution within Tesseracts. The same logic may apply here, but I'm currently not seeing how all the dots connect, so will leave it to @xalelax to decide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbowen3 let's add it to runtime config (which also implies having the TESSERACT_ prefix).

The main reason is to keep the vars very separate when one is using the mpa module of tesseract vs a custom client-defined mlflow client. In principle we could use the same var names, given that likely one would only do one of the two options described above, but this is safer for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xalelax I'm not sure I entirely follow here. We have to have these variables (MLFLOW_TRACKING_USERNAME/MLFLOW_TRACKING_PASSWORD) set for the package to work properly. In your solution, where do those get set? Does the user need to set both environment variables?

Copy link
Contributor

@xalelax xalelax Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to have these variables (MLFLOW_TRACKING_USERNAME/MLFLOW_TRACKING_PASSWORD) set for the package to work properly

Just tested it, and they are not necessary -- although mlflow docs suck for this, but the source is relatively clear.

It should just be basic http auth, so we can just get from the environment whatever we want (like TESSERACT_MLFLOW_TRACKING_USERNAME), and use them in the uri as follows:

mlflow.set_tracking_uri("http://user:[email protected]:5000")

(and also in the auth parameters in the get you added). mlflow will parse that uri properly.

To try it locally, one can start a container via

$ docker run -d -p 5000:5000 --name mlflow-server -e MLFLOW_FLASK_SERVER_SECRET_KEY="secret-key" --entrypoint bash ghcr.io/mlflow/mlflow:latest -c "pip install mlflow[auth] && mlflow server --host 0.0.0.0 --port 5000 --app-name basic-auth --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlartifacts"

(by default this will create a user called admin with pass password1234).

Then use this python script

import mlflow

mlflow.set_tracking_uri("http://admin:password1234@localhost:5000")
mlflow.set_experiment("test-creds")
with mlflow.start_run():
    mlflow.log_param("method", "uri-creds")
    mlflow.log_metric("test", 1)
print("Success!")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, one could also not change anything at all in this pr and just add those creds in the tracking uri (it's an env var anyway), but having this might be convenient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xalelax Thanks for the pointer on this! I wasn't aware that worked. I updated the PR to account for this.

@jpbrodrick89
Copy link
Contributor

jpbrodrick89 commented Dec 9, 2025

Note that mlflow also accepts a credentials file, we should support mounting that as an option. Mlflow should automatically read this file if environment variable aren't set.

@xalelax
Copy link
Contributor

xalelax commented Dec 10, 2025

@nbowen3 feel free to implement @jpbrodrick89 's suggestion if you can -- or leave it open, and I will implement in a follow-up pr 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants