Skip to content

feat: Add constrained search tutorial#37

Open
jdickerson95 wants to merge 22 commits intodocs_mainfrom
constrained_tutorial
Open

feat: Add constrained search tutorial#37
jdickerson95 wants to merge 22 commits intodocs_mainfrom
constrained_tutorial

Conversation

@jdickerson95
Copy link
Copy Markdown
Contributor

Added notebook for constrained search tutorial.
Added a utility script to process successive constrained search results.

@jdickerson95 jdickerson95 added the documentation Improvements or additions to documentation label May 9, 2025
Comment on lines +43 to +60
def get_micrograph_id(filename: str) -> str:
"""
Extract micrograph ID from filename.

Parameters
----------
filename : str
Filename to extract micrograph ID from

Returns
-------
micrograph_id : str
Micrograph ID
"""
base_name = os.path.basename(filename)
# Extract the part before _results.csv
parts = base_name.split("_results.csv")[0]
return parts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can actually be grabbed from the micrograph column in the csv file rather than assuming some relative naming scheme between the results file and the micrograph

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By 'micrograph_id' I'm not meaning micrograph. It's the base of all the results files. By extracting it from a results file directly, the results file does not need to have the same base as the micrograph. It does however assume that the results file ends in results.csv.

I could add an option to specify the suffix.
I could also rename the variable for more clarity.

Comment on lines +63 to +64
def process_directories_sequentially(
directory_list: list[str],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is fairly complex and difficult to follow what each part is doing. Should be split up into functional blocks (actually processing the data) and parts which print out some helpful information.

@mgiammar
Copy link
Copy Markdown
Member

Some other things I found when running through the tutorial are

  1. The match template program (for cropped, 60S full, and 40S full) each write to the same results paths. These need to be updated in the configs and uploaded to Zenodo. @jdickerson95 I believe these results exist on one of the computers, but correct me if I'm wrong.
  2. The constrained search YAML files reference the RefineTemplateManager object in their header comment.
  3. Package tqdm behavior writes a ton of lines to the output cells during both the match template and refine template. We should add an option to either turn them off or have each of the progress bars update less frequently. Otherwise notebook performance started to suffer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the example notebook under docs is completed, this file should be deleted

@mgiammar mgiammar changed the base branch from main to docs_main August 25, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants