-
Notifications
You must be signed in to change notification settings - Fork 2
Add Makefile
and enhance docs for processing / development thoughts
#3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional comment is that the way this repo is set up does not mimic most of the lab's analysis repos.
Ideally, we have a structure that looks like the following:
project/
└── src/
├── a_dir/
│ └── file1.py
└── b_dir/
└── file2.py
└── analysis1/
└── a_dir/
└── analysis_file1.py
└── b_dir/
└── call_module.sh
Where the analaysis_file1.py
file shares only the project parent in its directory and we need a way to import as a module functions from /project/src/a_dir/file1.py
and /project/src/b_dir/file2.py
.
The question becomes where are we evoking the python file from? If evoking from project/analysis1/b_dir/call_module.sh
how do you code the imports? Or is evoking the python file from the shell script directory wrong and we need to move the shell script to the base directory?? Or does it not matter where the shell script is because we will cd
into the dir holding the python script to call it?
I do not mind how we call the python script (make, shell, conda, uv, etc). But we need to understand from where we are calling it to understand how to write the script.
@@ -1,4 +1,4 @@ | |||
{ | |||
// Make notebooks resolve relative paths from the repo root | |||
"jupyter.notebookFileRoot": "${workspaceFolder}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider not using this file as it would inhibit testing of notebooks as if we are not using vscode
We suggest programming from the perspective that both `file1.py` and `file2.py` have access starting from the directory `project/`. | ||
This means `file1.py` may access `file2.py`, and vice-versa. | ||
We suggest paths found within `file1.py` and `file2.py` should include relative pathing from the perspective the directory `project/`. | ||
For example, `file1.py` might reference `file2.py` using path `src/b_dir/file2.py`. | ||
|
||
These are not laws and instead are a suggested philosophy to help you deduplicate and simplify your code. | ||
If you need to break these suggestions we recommend encapsulating processor execution directory context for each isolated occasion. | ||
For example, consider the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems non-consistent with the python script:
Where from cow.say import personalized_cowsay
if programmed from the perspective of the project/
directory should actually be from src.cow.say import personalized_cowsay
. Which is displayed in this readme but not carried out in the notebook/script/
- ipykernel | ||
- papermill | ||
- pip: | ||
- -e . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that needs to be run on all projects that have util scripts? Should utils actually be in a src dir and installed? Without this line do imports fail? Are we actually importing modules from their path or from the env?
run_code_uv: env_install_or_update_uv ## Run code via uv | ||
uv run --python '$(ENV_UV_VER)' python src/notebooks/example.py | ||
|
||
run_code_conda: env_install_or_update_conda ## Run code via conda | ||
conda run -n '$(ENV_NAME)' python src/notebooks/example.py | ||
|
||
run_code_uv_papermill: env_install_or_update_uv ## Run jupyter notebook via uv | ||
uv run --python '$(ENV_UV_VER)' papermill src/notebooks/example.ipynb src/notebooks/example_output.ipynb | ||
|
||
run_code_conda_papermill: env_install_or_update_conda ## Run jupyter notebook via conda | ||
conda run -n '$(ENV_NAME)' python -m papermill src/notebooks/example.ipynb src/notebooks/example_output.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These only run for me when using the conda or uv env and not outside of the environment. This leads me to beleive that the env is installing the src modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the makefile using this on an analysis repo where a shell script within a module that needs to call scripts would get entirely too messy (given my assumption that each repo should only have a single make file that lives at the repo root).
I also do not understand what is inherently different in this makefile compared to a shell script?
Makefile does not change how the python code is run? The environments do though which can be run in a shell script.
The makefile IMO is great for managing these workflows but does not help to solve the problem that initially identified. How to reduce redundancy and antipatterns in python code while retaining a way for the python interpreter to know where it is to import modules and utils.
This PR adds content based on a conversation with @MikeLippincott, expanding the work to use a
Makefile
which uses bothconda
oruv
in isolation from the same codebase. The readme has been updated also to provide some guiding principles / philosophy which ground the work.