-
Notifications
You must be signed in to change notification settings - Fork 4
fix: reachability check uses mlflow username/password if provided #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
CLA signatures confirmedAll contributors have signed the Contributor License Agreement. |
xalelax
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle LGTM, but
- fix linting (you can use pre-commit for it)
- the comments here are superfluous, I'd delete them
|
@PasteurBot I have read the CLA Document and I hereby sign the CLA |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #416 +/- ##
==========================================
+ Coverage 76.69% 76.80% +0.10%
==========================================
Files 29 29
Lines 3420 3427 +7
Branches 533 533
==========================================
+ Hits 2623 2632 +9
- Misses 565 566 +1
+ Partials 232 229 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I approved, but maybe let's also document this somewhere in the docs if you find a nice spot 🙂 |
tesseract_core/runtime/mpa.py
Outdated
| username = os.environ.get("MLFLOW_TRACKING_USERNAME") | ||
| password = os.environ.get("MLFLOW_TRACKING_PASSWORD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we integrate these envars with the config, cc @xalelax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sakoush The mlflow package expects those specific env variables so we have to set those no matter what. We could add these to config but then either need to set them twice or take the config version and apply it to these variables somewhere within the Tesseract. This felt cleanest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the reason of why this was done that @nbowen3 wrote in the description of the pr (and here) is satisfying, not all env variables need to be there, only stuff super associated with Tesseracts (like "mount these volumes", "use a gpu", and so on). These ones are associated with mlflow, and with the exact same name that one would use with its client, so it seems to me like the best choice.
Maybe @dionhaefner can also chime in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, to add to my comment above, maybe we could also consider renaming TESSERACT_MLFLOW_TRACKING_URI to just MLFLOW_TRACKING_URI, as the former might be a bit confusing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have MLFLOW_TRACKING_URI but it interfered with people rolling their own MLFlow solution within Tesseracts. The same logic may apply here, but I'm currently not seeing how all the dots connect, so will leave it to @xalelax to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nbowen3 let's add it to runtime config (which also implies having the TESSERACT_ prefix).
The main reason is to keep the vars very separate when one is using the mpa module of tesseract vs a custom client-defined mlflow client. In principle we could use the same var names, given that likely one would only do one of the two options described above, but this is safer for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xalelax I'm not sure I entirely follow here. We have to have these variables (MLFLOW_TRACKING_USERNAME/MLFLOW_TRACKING_PASSWORD) set for the package to work properly. In your solution, where do those get set? Does the user need to set both environment variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to have these variables (MLFLOW_TRACKING_USERNAME/MLFLOW_TRACKING_PASSWORD) set for the package to work properly
Just tested it, and they are not necessary -- although mlflow docs suck for this, but the source is relatively clear.
It should just be basic http auth, so we can just get from the environment whatever we want (like TESSERACT_MLFLOW_TRACKING_USERNAME), and use them in the uri as follows:
mlflow.set_tracking_uri("http://user:[email protected]:5000")(and also in the auth parameters in the get you added). mlflow will parse that uri properly.
To try it locally, one can start a container via
$ docker run -d -p 5000:5000 --name mlflow-server -e MLFLOW_FLASK_SERVER_SECRET_KEY="secret-key" --entrypoint bash ghcr.io/mlflow/mlflow:latest -c "pip install mlflow[auth] && mlflow server --host 0.0.0.0 --port 5000 --app-name basic-auth --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlartifacts"
(by default this will create a user called admin with pass password1234).
Then use this python script
import mlflow
mlflow.set_tracking_uri("http://admin:password1234@localhost:5000")
mlflow.set_experiment("test-creds")
with mlflow.start_run():
mlflow.log_param("method", "uri-creds")
mlflow.log_metric("test", 1)
print("Success!")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, one could also not change anything at all in this pr and just add those creds in the tracking uri (it's an env var anyway), but having this might be convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xalelax Thanks for the pointer on this! I wasn't aware that worked. I updated the PR to account for this.
|
Note that mlflow also accepts a credentials file, we should support mounting that as an option. Mlflow should automatically read this file if environment variable aren't set. |
|
@nbowen3 feel free to implement @jpbrodrick89 's suggestion if you can -- or leave it open, and I will implement in a follow-up pr 👍 |
Relevant issue or PR
Description of changes
If auth is present in MLFlow, we need to pass the creds in via environment variables (https://mlflow.org/docs/latest/self-hosting/security/basic-http-auth/#using-environment-variables). While this works for when we use the mlflow package, the reachability check just uses
requests. This adds the username/password to the requests.Note: I thought about using the
/healthendpoint of mlflow instead (which is un-authed on mlflow) but it is (a) undocumented so might change and (b) doesn't replicate actual usage as well.We also don't use the traditional tesseract config package (prepand config with
TESSERACT) for the same reason (want to replicate the actual process the mlflow package will use when it makes requests).Testing done
Local testing