Skip to content

Conversation

@snazy
Copy link
Member

@snazy snazy commented Dec 4, 2025

Add the NoSQL specific metastore persistence types including the mapping from and to *Polaris*Entity.

Add the NoSQL specific metastore persistence types including the mapping from and to `*Polaris*Entity`.
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Dec 8, 2025
@dimas-b dimas-b merged commit 90cff77 into apache:main Dec 8, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Dec 8, 2025
@snazy snazy deleted the nosql-types branch December 8, 2025 14:07
@singhpk234
Copy link
Contributor

Apologies @dimas-b but i am unable to understand the rush of getting these NoSQL huge PRs in and not giving community apporpriate time to review the code, I am strongly concerned about this, this pr is ~7k LOC the diff is so huge that it doesn't even render in github UI

This PR was raised ~4 days ago of which 2 days were weekend and i was reviewing these changes considering i have appropriate time, but i just open the github to see it was approved 7k LOC merged in less than a minute after approval, even when it had been called out the NoSQL impl is not following the practice we have already in the Polaris :

Screenshot 2025-12-08 at 9 06 23 AM

I request to please give more set eyes to get this massive code changes in.

cc @jbonofre @dennishuo @flyrain @collado-mike

@dimas-b
Copy link
Contributor

dimas-b commented Dec 8, 2025

@singhpk234 : I welcome you interest in the NoSQL Persistence code. The more developers get involved, the better code quality will be, I'm sure.

As to your specific points, the current guildelines indicate "two working days" as a reasonable time period for providing initial PR comments.

As for "rush", recent NoSQL PR basically chip off small code chunks from #1189, which has been available for interested parties to review for many months now (and some community members do have hands-on experience with it). It is in "draft" only to indicate that is it not meant to be merged whole, but it was mentioned in the original NoSQL proposal in March 2025 and multiple online meetings (IIRC). AFAIK, the community is in general agreement on accepting the NoSQL Persistence contribution.

These changes are isolated from the rest of the Polaris codebase and do not affect any existing code paths, as far as I can tell.

However, if you or someone have specific concerns about the new code after merging, by all means, let's discuss on the dev ML.

@dimas-b
Copy link
Contributor

dimas-b commented Dec 8, 2025

@singhpk234 : On the second reading, I'm not sure I actually understand whether your comment above applies to this PR or to #3135 ... If it's the latter, please note that your screenshot shows your comments as "pending" - noone but you was able to see them in GH before.

@singhpk234
Copy link
Contributor

@dimas-b, current guidelines just says about first round of comments and not when to merge, orthogonally, nevertheless we all agree that the current guidelines needs some refinement and hence we as a community we are working towards defining new one : #3067 and i strongly think LOC should be factor on this.

AFAIK, the community is in general agreement on accepting the NoSQL Persistence contribution

The community has recently expressed their concerns about what NoSQL meeting (12/2) i believe we all were there :
dev list : https://lists.apache.org/thread/t6ddtgk0wt92opphvy0o6lvx8pjk0go8
video ref : https://drive.google.com/file/d/1r_7bPtQEp7jdB1gtP15KvC_qLE6271Rf/view

  • the main questions why do we need to support all backend, why do we need to make yet another iceberg on top of database.
  • severe perf consideration and req to benchmark 1 catalog with 100k table with concurrent update randomly happening in any of the table

but instead of adressing them, that we are adding more and more code, i am unclear what happened to the things we discussed in the meeting are they just lost or they not considered concerns. At this point of time i don't think there is a concencus on NoSQL approach

second we had the mentor @fpapon of the project addiontally suggested we need minimum of 2 approvers in thread here : https://lists.apache.org/thread/hzxds729v5r68togbfx76l14f9m4bfj4

I am unsure starting a new ML to call this out is gonna make any difference if the above ones didn't.

I request again please give appropriate time for review, with the screenshot i wanted to share is i was indeed review (I know its only visible to me) but the review was not given appropriate time, before i could publish and if we would have just give
some time post your approval, I would have got notification that this PR is about to merge i need to publish my review even if its half baked, but in one minute it got merged.

@snazy
Copy link
Member Author

snazy commented Dec 8, 2025

@singhpk234 I appreciate you taking the time to review the code!
I totally understand that everybody has a ton of things on his plate and time is an extremely rare resource for everybody.

Regarding the two points you and @collado-mike mentioned, may I refer you both to @pingtimeout's email to the dev-ML from March 20th this year, where he mentioned this doc with a very detailed performance analysis? I admit, that it's been quite a while since that email was sent.
Let me note, that the benchmark was a very important foundational piece and that a bunch of optimizations addressing his findings were implemented in the (NoSQL) code base since then.
If you have different, reproducible results for such benchmarks, I would be glad to collaborate on the findings!

Let me also reply to your other point about "having to support all backends": I think this is driven by demand from the whole community, for example #844 from January this year, recently gotten even some more interest. In the recording you mentioned (I assume it is the one from Adam's presentation) I clearly stated that there is no intent to implement support for all thinkable database. What I clarified is that it is doable.

We have been discussing the whole NoSQL approach for quite a while, mentioning it repeatedly on the dev-ML, in community syncs and docs including this one from last year, which is open for comments since then.

There were also a public presentations held by @adam-christian-software quite some time ago, which he repeated recently.

The "big illustrative PR #1189", open for review for ~266 days, was explicitly intended to collect feedback. But it hasn't got many comments, despite asking continuously and repeatedly for it. Any feedback is more than welcome!

I apologize that I fail to see what else we could have done beside all the things that were already mentioned. Everybody's open for suggestions to streamline the process for such big changes!

@singhpk234
Copy link
Contributor

singhpk234 commented Dec 8, 2025

@snazy appreciate your response !

I have myself ran these benchmarks when i doing relational jdbc pg, but i am pretty confused on this being mentioned when the ask was very specific, please run a benchmark with 100k table and 1 catalog with a TPS 100, the benchmark mentioned by pierre is 50 TPS with 500 catalog, 65K namespace, 65K Tables, i don't think it remotely even address the case which is being requested and concerns folks have raise for this architecture Concurrent Benchmark configurations, so i would still consider this a very open question not being addressed.

"having to support all backends"

I haven't seen any DISCUSS thread and the agreement on the same ? I clearly see and you mention this too that a PPMC (@collado-mike ) is questioning the same. I am unclear at this point where was this decision made, My understanding is this effort started with an effort to add MongoDb support.

There were also a public presentations held by @adam-christian-software quite some time ago, which he repeated recently.

Precisely I happen to attend one of those really appreciate you all doing this and have started to contribute reviews ! I understand the design better.

Any feedback is more than welcome!

Precisely and I am happy too :) ! please just lets make sure we give other people enough time to give feedbacks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants