Skip to content

Conversation

@Biomerene
Copy link
Collaborator

I finally got around to fixing some of the last items of the features I added. As far as I can tell this should not break anything.

New Features:

  • A command line interface which allows you to automatically convert a folder of genomes to models (assuming modelseedpy does not break).
  • A black and grey list which allows you to limit or ban reactions from the database
  • A function to automatically load a medium file (needed for the CLI)
  • Partially updated example notebooks
  • Fixed some things related to BiGG

It misses some of the later commits of the main branch (mainly related to what @hariszaf did last week I think). Hopefully we can fix this using GitHub. I also have not yet written anything about the CLI in the example notebook or readme as this will take some time. The medium and medium_file arguments might be redundant but I did not want to remove something.

Let me know if I missed something or do something differently

Copy link
Collaborator

@hariszaf hariszaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Biomerene !
Very useful PR!
I have a couple of minor comments.
Once we resolve those, we could merge the PR to main and then check what if anything from #19 is still needed

elif modeltype:
if modeltype == 'ModelSEED':
self.path = TRAINED_NN_MSEED
elif modeltype == 'BiGG':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Biomerene do we know support BiGG all the way through? 😃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gap-filling BiGG models should work now yes, the models do not get refined based on the aliases (which are part of the ModelSEED database). See also my comment on the refinement in the model build (#26 (comment)).

return o

def generate_training_set(data,nuplo, min_con, max_con, min_for, max_for, del_p, con_p):
def generate_feature(data,nuplo, min_con, max_con, min_for, max_for, del_p, con_p):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after data,.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a space (49beebc)


import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '5'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tensorflow now a strong dependency ? if it is optional, then i think this may be better to be under an if..else statement because I guess it would lead to an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this documentation to be loaded when the tensorflow package is loaded (442d04b)

# 1) Add metabolite info

if metabName in compounds_aliases_dict:
metab.annotation = compounds_aliases_dict[metabName].copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is related to #24 right?
If yes, then it would be a good thing to edit your PR message and mention that the issue is addressed in this PR, to be able to track it! 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is related, I think, but I am not entirely sure if this issue is entirely resolved now, we would need to retest the models with memote but I have not done this


reac.name = reac.id
elif dbType=='BiGG':
print("skipping refinement, not currently supported for BiGG models")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..
I am a bit confused with the BiGG case.
Would you like to make clear what parts of the dnngior workflow are supported for BiGG models and what are not, so we implement those in the fututre?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, this function makes use of some additional information and functions from ModelSEED biochemistry to improve the annotations, I think @danielriosgarza build this part. This is not there for the BiGG database so right now this step is just skipped. I am not sure if we mention the refinement of the ModelSEED models.

else:
sys.exit('# ERROR: your output_folder should have parents')

def main():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is your main function for the CLI, I suggest you add some documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some documentation (with some help from collab) see commit c98534a

gapfill_model_wrapper(args)
print('# Done')
else:
sys.exit('I dont think this message can show, if it does, you (or more likely me) did something weird')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider editing this message returning something like
"either a fasta_folder or a model_folder needs to be provided"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be already be caught by the argument parser, but this is more informative in case that did not work correctly, I updated it.

side = -1
else:
mets[i] = stoc
reactions[reaction]['metabolites'] = eval(react_d[reaction][0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why using eval here? if there is a chance for an error, would not be better to have some catches for them ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the metabolites are a dictionary that are saved as a string in the bigg_reactions.tsv database, arguably it would be better to save it as a dictionary in the file instead. For now I added a comment in the code (d7bf238).

Input:
list of reactions to remove from all reactions
Output:
None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bit funny.
Consider having something like "trimmed list of candidate reactions "

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to read "list with reactions removed" (d7bf238) though technically the function does not output anything but modifies the list in place

@Biomerene Biomerene merged commit 17e08a8 into main Feb 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants