-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Category for manipulating categories using boolean operators #218
Conversation
Codecov Report
@@ Coverage Diff @@
## master #218 +/- ##
==========================================
- Coverage 85.10% 84.29% -0.82%
==========================================
Files 124 129 +5
Lines 20264 20623 +359
==========================================
+ Hits 17246 17384 +138
- Misses 3018 3239 +221
Continue to review full report at Codecov.
|
Seems like could be useful to save the user some for loops! I still have to read the code more to understand some things, but: don't you think it could be useful that # return a list of either NullCategory or `n2` categories
cat = n2.categorize(hBN)
# retrieve all indices where the category is not null
idx_n2 = np.array(cat != NullCategory()).nonzero()[0] you could do idx_2 = n2.filter(hBN) Maybe the filtered_geom = n2.sub(hBN)
#
idx_2 = n2.in(hBN) (I'm proposing And maybe using it on the reverse order # For receiving geometries
filtered_geom = hBN.sub(n2) # _sanitize_atoms accepts this categories?
# or
filtered_geom = hBN.sub(!n2)
filtered_geom = hBN.without(n2)
idx_2 = hBN.where(n2)
# or
idx_opposite = hBN.where(!n2) |
By the way, this kind of functionality is very similar to what a pandas dataframe would do isn't it? This is a wild idea, I know, but maybe it could be useful to generate dataframes containing some attributes of the geometry: df = geom.to_df(cols=["Z", "neighs", "angles", "force"])
|
Something like this would be an ok idea. I am not too sure about the method name... Filter don't quite cut it ... hmm... Probably your
I think the category shouldn't do too many things that it isn't intended to do. It should do categorization, and that's it. I.e.
Yes, this was actually my intent ;)
I still need to figure out how to do Thanks for the feedback!! |
Yeah, this could perhaps be a good idea. :) |
I guess each dataframe column should be build by the corresponding attribute (#196), not (there are lots of references to that issue haha) |
So I got it ;) Now one can do: ~n2 which is the opposite ( |
Overall I think it looks promising :) My advice is just to focus on making application as easy/easier as the example with Z>5, while making sure that powerful expressions are still nearly equally as easy to type out and apply to a geometry. |
I don't know if this belongs to this discussion, but since I already said something about it, I'm going to comment it here. It's regarding this:
I just thought that, apart from filtering, this feature in sisl objects could be extremely useful for visualizing. Here's why: For the visualization module I am using plotly. Plotly has a high-level API under import plotly.express as px
px.scatter3d(df, x="The x column", y="The y column", z="The z column",
color="Column that defines color", animation_frame="The column that defines the frames",
symbol=..., etc ) So, if sisl objects had a import sisl
import sisl.viz.express as sx
geom = sisl.Geometry()
sx.scatter(geom, x="z", y="neighs", color="species")
# or
sx.histogram(geom, x="z", color="species",
...other very useful kwargs of plotly express like marginal="violin") And really all that would be happening would be that this line: sx.*(sisl_obj, x="z", y="neighs", color="species") is converted into this other line: px.*(sisl_obj.to_df(cols=["z", "neighs", "species"]), x="z", y="neighs", color="species") I don't know, seems pretty exciting to me :) |
Thanks Jonas for your insights!
Yes, this is a problem I had since the inception of sisl. I don't know if I should change the One could have a global switch for this, e.g.
I don't think the typing issue is really bad. You build your properties and combine. prop = AtomCat(Z=3, norb=4)
# vs.
prop = AtomZ(3) & AtomOrbs(4) that's very little difference in typing, also I think the latter is more clear. Generally I also think these categories will be mildly used. So users with experience may easily create wrappers. AtomCat.Z(3).and.NOrbs(4).xor.Z(4) I like your idea about
Clarity is better than anything else ;)
Yeah, I can see that, I don't get why this is. I'll try and fix it... Thanks for pointing this out!
Correct, thanks!
Yes, this is because I am not really sure what to do. n2 = AtomNeighbours(2)
C_n2 = n2 & AtomZ(6)
notC_n2 = n2 ^ AtomZ(6)
...
cats = (C_n2 ^ notC_n2).categorize(hBN)
for cat in cats:
if cat == C_n2:
# for sure this is both n2 and Z=6
elif cat == n2:
# this will be returned if it has 2 neighbours but Z!=6
elif cat == AtomZ(6):
# neighbours not 2 but Z=6 this is of course a contrived example.
Thanks for your comments! |
Maybe there could be two categories, an "overlaps" for the overlap situation, and a "neighborhood" where R then should be specified? I think this would cover the relevant use cases. About which is clearer, keywords or multiple objects, maybe it is a matter of what one is used to. Perhaps at a later point there can be a Regarding the or-category -- you know how the |
On a scale from 1 to 10, how unpythonic would it be that So that you could do: geom.sub(AtomZ > 3) I don't know if that's even possible if class AtomZSingleton:
def __gt__(self, val):
return AtomZCat(gt=val)
def __call__(self, *args, **kwargs):
return AtomZCat(*args, **kwargs)
class AtomZCat:
def __init__(self, **kwargs):
self.gt = kwargs["gt"]
AtomZ = AtomZSingleton() and then you could use it in both ways: geom.sub(AtomZ(gt=3))
geom.sub(AtomZ > 3) I don't know, it looks good to me since it maintains all the previous functionality and adds that extra possibility of using it in an even simpler way if suitable. What do you think? |
By the way, regarding what Jonas said:
I also think that something like this should exist. I always think that having objects is much more convenient and clear for regular usage. However, there are some cases where you need to define things dinamically and your happiness is infinite when you discover that somebody has thought about it to make your life easier. Just as an example, it's obvious that the most convenient and clean way to import modules is: import sisl but if you need to import a package dinamically based on user input importlib.import_module(package) is priceless. |
I was more thinking globally so
Actually I agree, it was a bad example ;)
Thanks! I didn't think about it this way. I'll do this :) |
I think it would be doable, but you'd have to use metaclasses for this (or your work-around). Hmm. I don't particularly like the complexity it adds, I don't know if metaclasses for this would give unwanted consequences. ;)
Hmm.. Yes, this could work. But it would require many classes, 2 classes per Category?
I don't think we should go there yet, having 1 solution which works good is ideal. Also, your example works good for single elements, but for AtomNeighbours == {"min": 3, ...} which I don't think improves readability... ;) |
I have now enabled the kwfilter thingy by this: AtomCategory.kw(Z=5, neighbours={"max": 2, "min": 1})
AtomZ(5) & AtomNeighbours(1, 2) where the two are equivalent. Perhaps the |
But this could be easily automathized, couldn't it? With a "singleton builder": def category_singleton_factory(Cat):
class CatSingleton:
def __call__(self, *args, **kwargs):
return Cat(*args, **kwargs)
def __gt__(self, val):
return Cat(gt=val)
def __lt__(self, val):
return Cat(lt=val)
def __eq__(self,val):
return Cat(val)
def __ne__(self, val):
return ~Cat(val)
def __repr__(self):
return Cat().__repr__()
#CatSingleton.__doc__ += Cat.__doc__ # Add the documentation of the category
CatSingleton.__name__ = Cat.__name__
return CatSingleton() And then for each class you would just need to: class AtomZCat:
def __init__(self, **kwargs):
print(kwargs)
AtomZ = category_singleton_factory(AtomZCat)
Yes, of course this is only helpful for simple cases, but I think it already gives good advantages in usability. For your example of a range there could be shortcuts like: AtomZ == [3,5] #But maybe that shouldn't be a range
AtomZ == range(3,5) # This is really not helpful
# Maybe something like (you don't need the singleton implementation for this)
AtomZ.range(3,5) Or But still, my idea was just for simple comparisons, then it starts getting less advantageous relative to the current approach. |
I must admit I don't particularly like this. So I will say this is a no-go. I'll have to think about the other things. There are many things one can do. But it should also be maintainable. ;) |
This could be extended for more elaborate uses. It is currently not that fast but performance could be improved using the InstanceCache class that wraps a class and lru_caches methods. It seems to work rather nice. One can do complicated things with very little effort. However, there are things that is not taking into account. For instance we need to handle "not Category" to do the opposite category.
One cannot override ! (easily), but ~ is also used substantially.
The OrCategory follows regular Python semantics, 1 or 2 == 1. Now one can do AtomCategory.kw(Z=5, neighbours={"min":2}) Thanks to Jonas for these suggestions.
NullCategory now compares to None for easier comparison. Moved AtomCategory to sisl/geometry.py since this is required for _sanitize_atom. Added categorize to _sanitize_atom.
Sorry, I didn't find the time yet to explore this development, but I am sure it is great! I will revisit #202 to make use of this functionality. |
@tfrederiksen no worries ;) If you don't mind, I would like some feedback on this first. Then we adjust categories, and finally we can do the bond-completion. :) Then I think it could be rather cool ;) |
This will allow users to generate boolean expressions and select, say, atoms based on simple arithmetic formulas.
This is better described with examples.
This was inspired by #202 to make it more general. However, this may also be used to get indices of atoms for neighbouring sites to holes etc. This would allow easier extraction of DOS for the given atoms.
@tfrederiksen @jonaslb @pfebrer96 I would like your thoughts on this line of progress. And also if you have some ideas! ;)
The relevant files are in
sisl/category/base.py
(will probably be moved tosisl/category.py
)and
sisl/geom/category/*.py
.