Skip to content

amd and pdd functions and tests #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

tinatn29
Copy link

  • amd.py file in src/diffpy/similarity/metrics (functions to compute amd and pdd)
  • tests (test_amd.py)
  • test_data

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the __init__.py file from the src directory. We don't want it there.

Let's discuss a bit the package structure. Is there a reason for the choice of creating an amd.py module and putting pdd in it? As things stand, I think a better structure might be to make a metrics.py module in similarity and then we would have imports that look more like:

from diffpy.similarity.metrics import amd, pdd

and so on.

@sbillinge
Copy link
Contributor

btw, this is failing tests. Things are missing from the requirements (and possibly pyproject.toml)

Actually, early commits I want to fail tests because I prefer if we write tests that capture the behavior we want before we implement the functionality in the functions, but we don't want them to fail because of things missing from the package structure.

Reach out if you have questions. This is a great start, but I prefer to go a bit more slowly initially.

@tinatn29
Copy link
Author

tinatn29 commented Apr 24, 2025

I can separate amd and pdd. I agree that it makes more sense to import them separately. I'll fix the tests and the news accordingly.

@tinatn29
Copy link
Author

Tests are failing because amd and pandas aren't listed in requirements. Should I add both of them to pip.txt?

@sbillinge
Copy link
Contributor

Tests are failing because amd and pandas aren't listed in requirements. Should I add both of them to pip.txt?

both conda.txt and pip.txt, but could we start a new PR from a clean main and work on the behavior? What Use Cases (UCs) do these address?

@tinatn29
Copy link
Author

tinatn29 commented Apr 24, 2025

Tests are failing because amd and pandas aren't listed in requirements. Should I add both of them to pip.txt?

both conda.txt and pip.txt, but could we start a new PR from a clean main and work on the behavior? What Use Cases (UCs) do these address?

The tests are passing locally btw! (when I run pytest), so I think the only issue is the requirements. I have 3 test cases for each.

  • amd_compare(cif1, cif1) returns 0 because two structures are the same
  • amd_compare(cif1, cif2) returns amd distance between two structures
  • amd_compare([cif1, cif2], [cif1, cif2]) returns a distance matrix (DataFrame) between the two lists (2 x 2 matrix for the test case)

same thing for pdd_compare
cif1 and cif2 are the files I put in test_data

I'll start from a clean main and do this one by one. Sorry for the messy branch!

@sbillinge
Copy link
Contributor

Tests are failing because amd and pandas aren't listed in requirements. Should I add both of them to pip.txt?

both conda.txt and pip.txt, but could we start a new PR from a clean main and work on the behavior? What Use Cases (UCs) do these address?

The tests are passing locally btw! (when I run pytest), so I think the only issue is the requirements. I have 3 test cases for each.

* amd_compare(cif1, cif1) returns 0 because two structures are the same

* amd_compare(cif1, cif2) returns amd distance between two structures

* amd_compare([cif1, cif2], [cif1, cif2]) returns a distance matrix (DataFrame) between the two lists (2 x 2 matrix for the test case)

same thing for pdd_compare cif1 and cif2 are the files I put in test_data

I'll start from a clean main and do this one by one. Sorry for the messy branch!

sounds good. The preferred approach for me is to figure out the UC (behavior someone using the code wants) then the code architecture and tests that implement hte behavior, and only then the functions themselves (though we generally define the functions and their API (inputs and outputs) and write the docstring as part of the architecture discussion.

I am not 100% sure who our target audience is and what they want to do, so I need guidance from you, but I could image the following UCs:

UC 1 - compute amd from a structure

  1. Simon wants to compute the amd from a structure as he is working in diffpy-cmi
  2. Simon passes the structure to diffpy.similarity (ds)
  3. ds computes the amd and returns it simon

UC 2 - pdd

  1. as UC1.-1.3 but Simon wants to compute the pdd

UC 3 - amd-similarity

  1. Tina wants to compute the amd similarity (amd-s) between two structures
  2. Tina gives both structures to amd_s()
  3. amd_s computes the amd-s and returns it to Tina

UC4 - pdd

  1. As UC3.1 - 3.3 but Tina wants the pdd-s

UC 5 - to scale

  1. As UC3 but a pairwise computation of amd or pdd (or whatever similarity measure we have in the future) over a large set of structures with good performance

@sbillinge
Copy link
Contributor

btw, please don't put the cif files in the PR yet. I would prefer that we pass in structures as diffpy structure objects so we are not wedded to a particular back-end (i.e., a file-system), so in general we want to separate the I/O and hte functionality to make functions more reusable. This means we will code up the test structures in the test function itself. If we test with files we prefer to build a pytest fixture that creates a files system at test time to test the I/O. But in this case I am not sure we even want to do that.

@sbillinge
Copy link
Contributor

sbillinge commented Apr 24, 2025

I copy-pasted the UCs as an issue. Let's leave that issue open and use it to collect our UCs. Feel free to add UCs (try and use the same pattern) if you can think of any. It is like a story-board. Just because we make a UC doesn't mean that we will implement the functionality, so don't hold back....this is a chance to capture everyone's ideas of what we want to be able to do with this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants