feat(kaggle-cli): Add sdk for benchmark notebook#90
Conversation
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| def normalize_status(status: object) -> str: |
There was a problem hiding this comment.
Might want a narrower type here. object seems not what you intended; see Any vs object.
| def normalize_status(status: object) -> str: | |
| def normalize_status(status: Any) -> str: |
| from kaggle_benchmarks.kaggle_client.utils import ( | ||
| KAGGLE_METADATA_MAP, | ||
| build_local_metadata, | ||
| convert_ipynb_to_py, | ||
| convert_py_to_ipynb, | ||
| normalize_status, | ||
| parse_remote_metadata, | ||
| ) |
There was a problem hiding this comment.
Might be clearer to import as a module. It usually makes the code easier to follow; style guide
| from kaggle_benchmarks.kaggle_client.utils import ( | |
| KAGGLE_METADATA_MAP, | |
| build_local_metadata, | |
| convert_ipynb_to_py, | |
| convert_py_to_ipynb, | |
| normalize_status, | |
| parse_remote_metadata, | |
| ) | |
| from kaggle_benchmarks.kaggle_client import utils as kaggle_utils |
There was a problem hiding this comment.
Should we move most of this to the kagglesdk? It seems out of scope for the benchmark package and may overlap with https://github.com/Kaggle/kaggle-sdk-python/tree/main/kagglesdk/benchmarks
There was a problem hiding this comment.
Yes that's another option.
- Ideally I think we want most heavy works implemented on
kagglesdkand we have thin wrappers here for users to directly use without jumping around. - Also it feels more nature to have all kinds of extensions like
kaggle-client,vscode-exensiontogether here for community contributions.
We could move part of the code to kagglesdk later, if people find this useful?
| meta = MagicMock() | ||
| meta.ref = "alice/my-benchmark" | ||
| meta.title = "My Benchmark" | ||
| meta.language = "python" | ||
| meta.kernel_type = "notebook" | ||
| meta.is_private = False | ||
| meta.enable_gpu = True | ||
| meta.enable_internet = True | ||
| meta.enable_tpu = False | ||
| meta.dataset_data_sources = ["alice/dataset"] | ||
| meta.competition_data_sources = ["comp1"] | ||
| meta.kernel_data_sources = ["alice/kernel"] | ||
| meta.model_data_sources = ["alice/model"] | ||
| meta.category_ids = ["personal-benchmark", "nlp"] |
There was a problem hiding this comment.
Here and in other places I would advise against mocking when you can use dependency injections [go/python-tips/013]
There was a problem hiding this comment.
Yes here meta is a mocked data object, and parse_remote_metadata takes it as a parameter so it's already DI?
rosbo
left a comment
There was a problem hiding this comment.
Link to my comment thread about adding this to our existing CLI rather than having a separate CLI tool. Let's discuss on that thread: https://docs.google.com/document/d/1xvOIzSAyYVNtff4S7aqELPNEpbAwmiem16jM68pzqJI/edit?disco=AAAB2hylHpA
Adds a kaggle-bench CLI with two subcommands: - run: publish and run a local benchmark script on Kaggle - fork: pull an existing Kaggle benchmark notebook for local editing Extends the BenchmarkNotebookClient SDK from PR Kaggle#90 with a command-line interface so users can trigger benchmark runs directly from the terminal without writing Python boilerplate. Tests: 5 new unit tests covering help output, argument parsing, and correct delegation to BenchmarkNotebookClient.
Kaggle Benchmark Client
This PR introduces the
BenchmarkNotebookClientSDK to manage Kaggle benchmark tasks. These tasks execute as Kaggle notebooks tagged with thepersonal-benchmarkkeyword.APIs
.ipynbformat, generates thekernel-metadata.jsonfile (allowing users to persistently save and edit kernel configuration overrides), and pushes the payload to Kaggle. Implements concurrent run guards to block overlapping executions of the same benchmark (bypassable viaforce=True)..run.jsonartifacts upon completion. Supports thread-based polling cancellation viathreading.Event.Testing
The code is validated by two test suites:
golden_tests) to test the full execution lifecycle.