Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several fixes and improvements enabling greater applicability of the algorithm for very large datasets #355

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

hdaan
Copy link
Contributor

@hdaan hdaan commented Nov 1, 2024

This pull requests contains several commits containing the following changes:

  • If there are already precalculated objects in the SpatiotemporalAnalysis object, the algorithm still goes on to detect more objects instead of only returning precalculated. This is done here
  • Adds intermediate saving of 4D-OBCs. A parameter is added (intermediate_saving) which allows to save objects every _n_th seed. This also enables the splitting of the 4D-OBC computation between batches of seeds. These can be defined by the new parameters resume_from_seed and stop_at_seed. Thus, the computation starts and stops at the given seed numbers, starting at 1.
  • Adds an extra parameter as input to enable writing the number of seeds detected after seed detection to a text file. This allows for the automated subsetting of the seeds into smaller batches. 4D-OBC computation on smaller batches of seeds can then be done in sequence, overcoming time constraints on some HPCs.
  • Adds a check with respect to the height_threshold parameter. This parameter was previously given but never used. Now, only seeds are added where the absolute difference between minimum and maximum of a seed is larger than height_threshold. This is done here
  • Adds a new parameter use_unfinished. If this is False, unfinished seeds, i.e., seeds that have not returned to their initial elevation are not considered as seeds for 4D-OBC segmentation, else, they are considered, as previously done.
  • Adds a check in the seed_sorting_scorefunction for seeds without neighbors. Previously these would give an error through division by zero here. Now they get assigned a very high value, giving them a very low ranking. As they don't have neighbors they won't be growing into 4D-OBCs either way. This is a workaround, which should be reconsidered, in the future.

hdaan and others added 27 commits April 3, 2024 09:43
Add an height_threshold check in seed detection, line 945. Fixes issue 3dgeo-heidelberg#324
added parameter and check to either use or not use seed candidates that are unfinished by the end of the time series
added check to ensure algorithm continues when only 1 neighbor is found during seed scoring
Added a psutil virtual memory usage tracker
fixed wrong printing statement of percentage virtual mem used
remove intermediate saving, too much memory used
added missing parenthesis
added new commits from py4dgeo
added intermediate saving and resuming from seed index in segmentatio…
added maximum seed index parameter, defining at which seed to terminate the region growing algorithm.
removed unnecessary print statements
fixed intermediate saving by not putting it at the end but at beginning of loop
…tomatically subsetting the computation, due to time constraints on clsuters
Previous `height_threshold` check in seed detection checked the full time series instead of the interval used as seed for determining if `height_threshold` was met
…ferencing the analysis.objects to copying the analysis.objects
changed write to number_of_seeds.txt to write a string
Made the writing of the number of seeds optional as parameter.
This allows for the automated computation of the sizes of subsets of seeds to be run sequentially, to workaround time constraints of HPCs.
@hdaan
Copy link
Contributor Author

hdaan commented Nov 1, 2024

Potentially solving issues #218, #314, #324

@kathapand kathapand self-assigned this Nov 8, 2024
@kathapand
Copy link
Collaborator

Thanks for these valuable contributions to the 4D-OBC algorithm, @hdaan!
I would like to merge all the suggested changes. But I cannot review all of them and maybe they are missing in the code changes. At the moment, only the precalculated is visible, the other parameters are not found in the file. Could you please check if these are contained or something needs to be added to the PR?

@hdaan
Copy link
Contributor Author

hdaan commented Nov 11, 2024

That is interesting. When I look to review the changes it does add them in the code.

Do you only see the result from this one commit then? Can you maybe show me what you see when you review?

@hdaan
Copy link
Contributor Author

hdaan commented Nov 12, 2024

There is one issue I just found in the seed_sorting_scorefunction().
Here if len(neighbors < 2): has to be changed to if len(neighbors) < 2:.

Can you update this before merging? Or should I open a new pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants