Skip to content

don't write fold-level structural prior files #290

@skinnider

Description

@skinnider

Currently, the structural_prior/ output directory for a given enum_factor will look something like this:

$ ls
smiles_SMILES_0_CV_ranks_structure.csv.gz  smiles_SMILES_7_CV_ranks_structure.csv.gz
smiles_SMILES_0_CV_tc.csv.gz               smiles_SMILES_7_CV_tc.csv.gz
smiles_SMILES_1_CV_ranks_structure.csv.gz  smiles_SMILES_8_CV_ranks_structure.csv.gz
smiles_SMILES_1_CV_tc.csv.gz               smiles_SMILES_8_CV_tc.csv.gz
smiles_SMILES_2_CV_ranks_structure.csv.gz  smiles_SMILES_9_CV_ranks_structure.csv.gz
smiles_SMILES_2_CV_tc.csv.gz               smiles_SMILES_9_CV_tc.csv.gz
smiles_SMILES_3_CV_ranks_structure.csv.gz  smiles_SMILES_min1_all_freq-avg_CV_ranks_structure.csv.gz
smiles_SMILES_3_CV_tc.csv.gz               smiles_SMILES_min1_all_freq-avg_CV_tc.csv.gz
smiles_SMILES_4_CV_ranks_structure.csv.gz  smiles_SMILES_min2_all_freq-avg_CV_ranks_structure.csv.gz
smiles_SMILES_4_CV_tc.csv.gz               smiles_SMILES_min2_all_freq-avg_CV_tc.csv.gz
smiles_SMILES_5_CV_ranks_structure.csv.gz  smiles_SMILES_min3_all_freq-avg_CV_ranks_structure.csv.gz
smiles_SMILES_5_CV_tc.csv.gz               smiles_SMILES_min3_all_freq-avg_CV_tc.csv.gz
smiles_SMILES_6_CV_ranks_structure.csv.gz  smiles_SMILES_min4_all_freq-avg_CV_ranks_structure.csv.gz
smiles_SMILES_6_CV_tc.csv.gz               smiles_SMILES_min4_all_freq-avg_CV_tc.csv.gz

This has the potential to mislead users. In particular, the fold-specific rank files are misleading, because any structure in the training folds is excluded from the sampled SMILES. So the problem is in effect made artificially easier: it is implicitly assumed that any given structure is a novel structure, not found in the training folds.
Fix is to simply not write these files. All plots should be made from the min1 file anyway.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions