Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

episode sample project #349

Merged
merged 29 commits into from
Jan 20, 2025
Merged

Conversation

elmbeech
Copy link
Contributor

This is the physicell episode sample project, written to run multiple consecutive episodes in a single runtime.
This project is linked to pull request #346 , #277 , and #266 .

Copy link
Collaborator

@drbergman drbergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad that we're starting to see some of what you're planning with these additions! As this PR requires some of the others that are not yet merged, can you merge them into your elmbeech:patch1141_episode so that we can test this project?

Since this project's name does not immediately suggest what it is modeling, could you add a sample_projects/episode/README.md to describe what users should be expecting from this? I am not sure what part of this highlights the power of the episodes. To me, all I see is that this runs the model 4 times. Is there more to this that I'm missing? If that's the power (avoiding the overhead of relaunching the program), could you benchmark and provide a simple bar plot, maybe in that readme? Just running n sims with and without episodes and comparing wall time.

@elmbeech
Copy link
Contributor Author

elmbeech commented Jan 14, 2025

Hi Daniel,

Sure, I can merge all the needed PR's together.

You are correct, all this sample project does is running consecutive episodes of a physicell model in one runtime.
This is an essential feature if you want to do learning. You simply cannot learn, if you have to destroy the runtime after each episode to reset to the initial condition because you will at the same time erase everything you learned from the episode.
We are on the way to releasing a PhysiCell project that does reinforcement learning, which simply can be installed under user_projects, but to have it work with the regular PhysiCell releases, we first have to get the changes from this PR into the code base.
The sample project is there for other developers, who want to do similar implementations and, most importantly, as a unit test, to make sure that this feature persists in future PhysiCell releases.

I can add a README with this explanation.

Best, Elmar

@elmbeech
Copy link
Contributor Author

looks like windows cases some troubles.
I will look into it.

@elmbeech
Copy link
Contributor Author

@elmbeech
Copy link
Contributor Author

elmbeech commented Jan 15, 2025

ok @drbergman, now everything is merged, modified as requested, and all tests are passing!
could you please have another look?

@drbergman
Copy link
Collaborator

Final question (no promises though 🙃): Does this sample project show the full use of the proposed updates to initialize_microenvironment and load_PhysiCell_config_file? In other words, will the reinforcement learning project rely on additional calls/functionality of those functions beyond what is in sample_projects/episode/main.cpp?

@rheiland
Copy link
Collaborator

rheiland commented Jan 15, 2025

I'd probably prefer to stick with just the #if defined(_WIN32) and not use the MINGW-related directives. I can imagine allowing other C++ compilers on Windows someday (I've done past tests using cmake and native Windows C++).

I also confirmed that this sample project compiles, runs, and generates the output<episode#> directories on my Windows machine (in Command Prompt and Powershell). But I did not check that the output was "valid". One nitpick might be to consider using leading 0s in your output directory filenames (output_00001, etc) for easier post-processing.

Also, when running, I see the WARNING related to <random_seed> on episodes >= 1 which may be concerning to users, e.g.,

run episode: 1 !
(re)set global variables ...
load setting xml config/PhysiCell_settings.xml ...
Using config file config/PhysiCell_settings.xml ...
WARNING: Setting the random seed again.
        You probably have set a user parameter called random_seed.
        Here, we will use the random seed set in user parameters.
        HOWEVER, as of PhysiCell 1.14.0, you should set the random seed in the <options><random_seed> element in the config file.
        Future versions of PhysiCell may throw an error here. Kindly remove the user parameter and just use the <options><random_seed> element.

(to be clear, there is no random_seed in user params; the warning is printed due to the logic in the C++ code)

and related to that, do you really want to use the same random_seed for every episode and do you want to use just 1 core (omp_num_threads)?

@elmbeech
Copy link
Contributor Author

Final question (no promises though 🙃): Does this sample project show the full use of the proposed updates to initialize_microenvironment and load_PhysiCell_config_file? In other words, will the reinforcement learning project rely on additional calls/functionality of those functions beyond what is in sample_projects/episode/main.cpp?

hmm ...
the main.cpp shows the full use of the proposed update to initialize_microenvironment and load_PhysiCell_config_file (and reset_max_basic_agent_ID and load_PhysiCell_config_file from earlier already accepted pull requests).
but the reinforcement learning will not run with this main.cpp. it is a bit more complicated what we had to implement for making reinforcement leading possible.
the episode sample project can be regarded as a totally boiled down version of what is absolutely needed to make (reinforcement) learning in physicell possible at all.

is this answering your question?

@elmbeech
Copy link
Contributor Author

elmbeech commented Jan 15, 2025

I'd probably prefer to stick with just the #if defined(_WIN32) and not use the MINGW-related directives. I can imagine allowing other C++ compilers on Windows someday (I've done past tests using cmake and native Windows C++).

cool! same conclusion on my side.

I also confirmed that this sample project compiles, runs, and generates the output<episode#> directories on my Windows machine (in Command Prompt and Powershell). But I did not check that the output was "valid". One nitpick might be to consider using leading 0s in your output directory filenames (output_00001, etc) for easier post-processing.

the leading 0 would be nice. could you please give me a hint, how to do this easy in C++?

I have a python script to check if the output is correct (exactly the same when running with one thread). it uses pcdl and the output is at this moment a bit cryptic. i tested heavily on linux. and trust that the output is the same on windows. this pcdl script will eventually become a unit test for the reinforcement learning project and run on github workflow. so i am not too worried that we not will catch such an error.
drift.py.txt

Also, when running, I see the WARNING related to <random_seed> on episodes >= 1 which may be concerning to users, e.g.,

run episode: 1 !
(re)set global variables ...
load setting xml config/PhysiCell_settings.xml ...
Using config file config/PhysiCell_settings.xml ...
WARNING: Setting the random seed again.
        You probably have set a user parameter called random_seed.
        Here, we will use the random seed set in user parameters.
        HOWEVER, as of PhysiCell 1.14.0, you should set the random seed in the <options><random_seed> element in the config file.
        Future versions of PhysiCell may throw an error here. Kindly remove the user parameter and just use the <options><random_seed> element.

(to be clear, there is no random_seed in user params; the warning is printed due to the logic in the C++ code)

I am aware of this warning, but this not really in my hand.

and related to that, do you really want to use the same random_seed for every episode and do you want to use just 1 core (omp_num_threads)?

yes, if you wanna check that you have no drift in consecutive episodes, you wanna run all episodes with the same seed and one thread.
and if you wanna do learning, you will have to run totally random anyway. so I think this is ok.
also, it is a sample project. if someone wants to change this behavior, he can simply modify it.

@drbergman
Copy link
Collaborator

hmm ... the main.cpp shows the full use of the proposed update to initialize_microenvironment and load_PhysiCell_config_file (and reset_max_basic_agent_ID and load_PhysiCell_config_file from earlier already accepted pull requests).

In this case, we can choose to refactor these functions rather than redefine them. This avoids introducing in these functions what I still find to be confusing delineations between what is and is not reloadable. See #346 for what is reloadable. My first pass replaces parts of your main.cpp with

if ( i_episode == 0 )
{
	XML_status = load_PhysiCell_config_file( settingxml );
}
else
{
	XML_status = read_PhysiCell_config_file( settingxml ); // NEWLY DEFINED (SEE BELOW)
	if (XML_status)
	{
		PhysiCell_settings.read_from_pugixml();
		create_output_directory( PhysiCell_settings.folder ); // could omit since elsewhere in main.cpp the output folder is created (or could omit that line instead)
	}
}
// reset microenvironment and mechanics voxel size and match the data structure to BioFVM
std::cout << "reset densities ..." << std::endl;
set_microenvironment_initial_condition(); // NEWLY DEFINED (does everything in set_microenvironment below the `if (!reload)` block)
microenvironment.display_information( std::cout );

where we define

bool read_PhysiCell_config_file( std::string filename )
{
	physicell_config_dom_initialized = false; 

	std::cout << "Using config file " << filename << " ... " << std::endl ; 
	pugi::xml_parse_result result = physicell_config_doc.load_file( filename.c_str()  );
	
	if( result.status != pugi::xml_parse_status::status_ok )
	{
		std::cout << "Error loading " << filename << "!" << std::endl; 
		return false;
	}
	
	physicell_config_root = physicell_config_doc.child("PhysiCell_settings");
	physicell_config_dom_initialized = true; 
	return true
}

I opened a PR showing the full set of changes for this refactor.

@drbergman
Copy link
Collaborator

Also, when running, I see the WARNING related to <random_seed> on episodes >= 1 which may be concerning to users, e.g.,

It only shows up on i_episode==1 for me. the SeedRandom has a static bool warned to make sure the user only gets warned once. If we want to avoid even that, we could move that (or probably the other static bool setup_done variable) into global scope and write a function to toggle its state.

refactor to simplify edits to core
@drbergman
Copy link
Collaborator

still need to either include test svgs in tests/cases/output_episode-sample/ or set the output_folder: "" in tests.yml

@elmbeech
Copy link
Contributor Author

still need to either include test svgs in tests/cases/output_episode-sample/ or set the output_folder: "" in tests.yml

I thought had already updated the yml files.
please check for folder output3 then we are sure that really consecutive episodes were run!

@drbergman
Copy link
Collaborator

please check for folder output3 then we are sure that really consecutive episodes were run!

That works manually, but the gh actions will want to test the svgs in output3 against what is in tests/cases/output_episode-sample/ because of the line in the tests.yml matrix for episode-sample. See the failed actions.

@elmbeech
Copy link
Contributor Author

this code passes all the tests.
i think we are pretty much there.
thank you!

@drbergman drbergman self-requested a review January 16, 2025 00:15
Copy link
Collaborator

@drbergman drbergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Refactors in the code improve code readability and make the necessary internals exposed for this project. A sample project that gives the framework a skeleton upon which to build will make this easier for users to pick up. Also, put the new project in the testing suite for CI.

@elmbeech
Copy link
Contributor Author

Thank you, Daniel!

@MathCancer
Copy link
Owner

NICE WORK!!

@elmbeech
Copy link
Contributor Author

Thank you, Paul!

@elmbeech
Copy link
Contributor Author

elmbeech commented Jan 16, 2025

I could successfully compile the reinforcement learning user_project.
The only thing I cannot make compatible (without going here into the details) with the current episode sample project is:

std::string settingxml = "config/PhysiCell_settings.xml";

to make it compatible, i might have to make it in C style like

char *settingxml = "config/PhysiCell_settings.xml";

i will check in deep right now.

@elmbeech
Copy link
Contributor Author

ok! that's definitely it.
all ready for the 1.14.2 release.

@MathCancer
Copy link
Owner

Elmar thank you for the OUTSTANDING work on this, as well as for being so responsive on edits.

Daniel, thank you for all the mentoring and QC!!!

@MathCancer MathCancer merged commit cffa248 into MathCancer:development Jan 20, 2025
113 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants