-
Notifications
You must be signed in to change notification settings - Fork 12
WIP:DataScan
refactor to expose data and statistics consistently
#94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@tylerriccio33 this is great, thanks for working on this key piece! Could you try merging with main and see if there are any conflicts? I did some recent work within this file (merging #93) just two days ago. All of that work didn't touch the Also, is it okay if I release a new version of the package while you're working on this? Given that |
I will push what I have right now for reference but it'll come with a ton of conflicts. I'm about 90% of the way there, I just have to unify the html rendering interfaces for each of the columns. To be clear I'm not changing functionality just unifying the way the comparison data is exposed so i can build on it for Compare. So, i'll push what i have now but I still have maybe a day or two to go until I think it's in a good spot. Sorry for the giant pr |
Sorry this is like a comically large refactor, we might need to meet in person again to figure some stuff out |
This is great! Don’t worry (I’m not worried) and keep going :) Send me an email anytime if you want to discuss anything through Zoom (we can schedule easily that way). |
Don’t mean to hold you hostage on this refactor, it’s virtually all complete, I just need to reroute all the html logic and add comprehensive test cases. Been busy last few nights, tonight I should have this done and we’d be off to the races. |
This is there, I think it's a start and can be improved in the future. More bugs will be found and ways to optimize the interface once we add new statistics.
There's no more branching, the function takes something Narwhals compliant and returns as such.
This was easy, thankfully, but I need to improve the test coverage as it relates to lazy and eager narwhals compliant frames. The code can cover lazy frames somewhat gracefully but it isn't battle tested. Currently, the big thing I'm still reconnecting is the HTML generation. The pr does generate the table, but it's absent a few of the amazing formatting features you had. I'm pretty close, I just need to reconnect and reapply the logic. |
@tylerriccio33 Take your time, you got this! Honestly, no rush as I'm not touching DataScan-related things for a while anyway. |
Do you think the datetime (and date) formatting could be handled by https://posit-dev.github.io/great-tables/reference/GT.fmt_datetime.html |
This is great! I will look at this closely tonight/tomorrow. (And I really do like the simple footnote compared to the tiny |
@tylerriccio33 This looks really good to me. As far as styling goes, we could always make little fixes along the way after this is merged. I think the last thing for you to do is to handle the conflicts, and change any tests in Not sure why CI isn't running here (I guess has to do with the conflicts?) but we'll probably understand why sooner or later. Also, I accidentally performed a merge commit that was pushed here (thought it was being pushed to a fork). I reverted that, but I hope it didn't do any harm to the PR. Apologies for that, and please let me know if it affected anything here. |
cac0f09
to
134a39b
Compare
I have no idea why these would fail, they're not failing on my machine and it's code i haven't touched. Probably some dependency problem, I'll look into it |
@rich-iannone I cannot for the life of me reproduce this... The if you look at the 2 tests failing are failing on this ("date_time", "timestamp(6)") assertion. I'm syncing the ci test run using uv, so confident there aren't some pip weirdness going on... Any thoughts? |
Fixed it, realized main had an xfail. How I wasn't able to reproduce that I have no clue. My solution was to do a procedural check to see what the schema type would be. Something still feels wrong, but it feels non deterministic... |
Also this would close #168 . I added uv to the test runner and dep installer in my quest to debug why the ci env could be different from my local dev. |
I think this is looking good, still a lot of little TODOs I left myself, which i'll have to tackle eventually :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the huge amount of work you put into this!
This is going to be a big one... I'm working to expose the statistics logic in
DataScan
in a more object oriented way. I'm planning to do 3 major things:This is all a draft as of now I just wanted to clue you in.
Also, I don't plan on changing any tests and the interface will remain untouched!