Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

33 transforms group #51

Merged
merged 17 commits into from
May 15, 2023
Merged

33 transforms group #51

merged 17 commits into from
May 15, 2023

Conversation

philswatton
Copy link
Contributor

@philswatton philswatton commented May 5, 2023

This PR:

  • Resolves New Experiment Group With Transforms #33, although we may wish to add more experiment groups in the future. I've raised Consider Adding Further Experiment Groups #52 to cover this
  • Adds a new experiment group called 'none-vs-transforms' to the datasets config, which compares an untransformed dataset A against a transformed dataset B
    • There are 4 transforms:
      • Little blur (kernel 3x3, sigma 1)
      • Big blur (kernel 3x3, sigma 3)
      • Grayscale
      • Flipping the image 180 degrees
    • There are 6 different drop % pairs: (0,0), (0,0.5), (0.5,0), (0.5,0.5), (0.25,0.75), (0.75,0.25)
      • These correspond to no data dropping, imbalanced data with overlap (drop 50% from one), balanced data without overlap (drop 50% from both), imbalanced data without overlap (drop 75% from one, 25% from the other)
    • This creates a total of 4x6 = 24 dataset pairs. Once the three seeds are considered, this brings us up to 72 total additional dataset pairs
  • Adds a compose_transforms function to modsim2.utils.config, which draws on the previous code for loading transforms, but allows for None to be supplied as an input (and will return None in this case)
    • This 1) allows for backwards compatibility: the previous experiment group will work the same as before
    • And 2) means that we can read in the transforms for the dataset config
  • Uses this in the opts2dmpairArgs function in scripts/utils.py function
  • For transforms to be applied, DMPair.A.setup() and DMPair.B.setup() may need to be called. DMPair.compute_similarity() will therefore throw an error if transforms do not match those supplied to DMPair() (if transforms are not None)
  • Some fixes to a couple of files to reflect changes in the code
  • Develop was merged in to get the changes from 7 implement ot #47

@philswatton philswatton linked an issue May 5, 2023 that may be closed by this pull request
@philswatton philswatton marked this pull request as ready for review May 9, 2023 12:56
Copy link
Contributor

@joannacknight joannacknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one suggestion.
The tests all passed on my machine.

Copy link
Contributor

@lannelin lannelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far, a couple of suggested changes. Finishing for now and will pick up again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this more in the interest of cutting down experiment space

here are 6 different drop % pairs: (0,0), (0,0.5), (0.5,0), (0.5,0.5), (0.25,0.75), (0.75,0.25)

what purpose do we think each of these serve, do we need all of them?
do we get anything further from the last two that we don't already have from 0.5,0.5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my rough line of thinking on this is:

  • I'd like to test how much of an effect having none vs some overlapping data there is, given our theory of data similarity
  • I'd like to also account for the fact that data imbalance may have an effect on transfer success
  • We should make sure that in our experiment these two things are fully independent of each other. If we got rid of the last two groups, we'd only be testing data imbalance in the case where there is overlap between the observations in both datasets

That said, this isn't a particularly strong justification, so I'm open to removing them if we have a need to cut down the experiment space!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's see how long things take to run but sounds reasonable to keep for now, thanks!

@philswatton
Copy link
Contributor Author

I've added some new commits that somewhat change the PR in line with in person discussion. We now have:

  • several smaller experiment groups, all with their own file within a folder within the config folder
  • code updated to reflect the new structure
  • some unused source code removed
  • tests updated
  • extra test added for the compute_similarity error logic

Copy link
Contributor

@joannacknight joannacknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pytest tests all passed for me. I also had a look at the generate scripts files and think I found one tiny issue there. I only ran the generate_metrics_scripts.py file, but think the issue is across all three files.

@philswatton philswatton requested a review from joannacknight May 15, 2023 08:37
Copy link
Contributor

@joannacknight joannacknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests all pass for me

Copy link
Contributor

@lannelin lannelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! tests pass and debug made sense.

minor comment on removal of a file then good to go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also remove transforms.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed!

@philswatton philswatton merged commit 54559a1 into develop May 15, 2023
@philswatton philswatton deleted the 33-transforms-group branch May 15, 2023 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Experiment Group With Transforms
3 participants