Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SharedStore provider #1147

Closed
wants to merge 15 commits into from
Closed

Conversation

danrspencer
Copy link
Contributor

Motivation

For additional context see: #1128

In large controllers you can end up with multiple instances of the same store, this has the potential to case a split brain problem where different bits of the application have a different view of the cluster.

This can be solved by creating stores at the top level of the application and passing them down; but the definition of the stores is then required to be at a distance from the usage sites, smearing responsibilities around the codebase.

Solution

The SharedStore struct will always return a clone of the same Store for a given scope and ListParams. The aim is to minimise the knowledge that must exist at a distance (only the types must be known, which can be cleanly asked for in controller interfaces), while still allowing separate parts of the application to share stores.

danrspencer and others added 14 commits February 17, 2023 13:25
1. Remove expensive cloning of Kubernetes objects
2. Allow propogation of err events

Signed-off-by: Dan Spencer <[email protected]>
Co-authored-by: Eirik A <[email protected]>
Signed-off-by: Dan Spencer <[email protected]>
Also renamed WatchStreamExt::subscribe to WatchStreamExt::stream_subscribe. The compiler was unable to tell if we were trying to call WatchStreamExt::subscribe or StreamSubscribe::subscribe when they were named the same.

e.g. this code would not compile:

let stream_subscribe = stream.subscribe();
let subscription = stream_subscribe.subscribe();

Signed-off-by: Dan Spencer <[email protected]>
Co-authored-by: Eirik A <[email protected]>
Signed-off-by: Dan Spencer <[email protected]>
…or the same list params for a given type. Also allows subscriptions to those stores.

Signed-off-by: Dan Spencer <[email protected]>
@danrspencer
Copy link
Contributor Author

@clux I've pulled out just the SharedStore bit of my bigger PR (selfishly because this is the most useful bit for us). Obviously we're missing docs and some tests here but before I spend more time on it I wanted to discuss if this is a direction worth pursuing.

@codecov-commenter
Copy link

Codecov Report

Merging #1147 (7c1cee1) into main (53ad9ee) will decrease coverage by 1.00%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1147      +/-   ##
==========================================
- Coverage   72.79%   71.80%   -1.00%     
==========================================
  Files          66       68       +2     
  Lines        5066     5136      +70     
==========================================
  Hits         3688     3688              
- Misses       1378     1448      +70     
Impacted Files Coverage Δ
kube-core/src/params.rs 78.52% <ø> (ø)
kube-runtime/src/reflector/mod.rs 100.00% <ø> (ø)
kube-runtime/src/reflector/shared_store.rs 0.00% <0.00%> (ø)
kube-runtime/src/utils/mod.rs 64.15% <ø> (ø)
kube-runtime/src/utils/stream_subscribe.rs 0.00% <0.00%> (ø)
kube-runtime/src/utils/watch_ext.rs 0.00% <0.00%> (ø)

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments. i do really like the exploration here and the map you've set up definitely has potential. you'll have to excuse me while I think long term about this for a little bit.

kube-runtime/src/reflector/shared_store.rs Show resolved Hide resolved
kube-runtime/src/reflector/shared_store.rs Show resolved Hide resolved
Comment on lines +16 to +22
pub struct SharedStore<K, W = WatcherFactory<K>>
where
K: 'static + Resource + Clone + DeserializeOwned + Debug + Send,
K::DynamicType: Hash + Eq,
W: CreateWatcher<K>,
{
watcher_provider: W,
Copy link
Member

@clux clux Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub struct SharedStore<K, W = WatcherFactory<K>>
where
K: 'static + Resource + Clone + DeserializeOwned + Debug + Send,
K::DynamicType: Hash + Eq,
W: CreateWatcher<K>,
{
watcher_provider: W,
pub struct SharedStore<K>
where
K: 'static + Resource + Clone + DeserializeOwned + Debug + Send,
K::DynamicType: Hash + Eq,
{

I feel like we can get away with not having a second generic parameter for this. The WatcherFactory's impls have exactly the same constraints as the SharedStore methods that construct them. Did you have a particular use case in mind for the WatcherFactory, or is it mainly for tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purely for the tests. I don't particularly like test induced damage to non-test code like this ... but I also don't like not unit testing stuff 😅. Do you have a better suggestion on how to test something like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly worth noting, users cannot create an instance of the SharedStore where W is anything other than a WatcherFactory since non of the properties are public and the only constructor we expose is for the variation using a WatcherFactory. I don't think end users ever need to know or care about the W generic 🤔.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, yeah, it's probably not that bad, I would also ideally have the damage confined to the tests if possible. was thinking there might be some ways to mock just one of the functions (like setup_reflector) used in this module, but maybe it's only mocktopus that lets you do functions (and that's nightly only).

i think it probably is OK. mocking watch streams is hard to do raw. i would otherwise suggest the mocked client with tower_test as is used in client/api.

watcher: BoxStream<'static, watcher::Result<Event<K>>>,
) -> (Store<K>, BoxStreamSubscribe<K>)
where
K: Resource + Clone + Send + Sync,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the Sync requirement come from? It's propagated everywhere here, and neither watcher nor reflector should have more than Send required AFAIKT.

Comment on lines +292 to +298
async fn spawn(self) {
tokio::spawn(async move {
self.run().for_each(|_| async {}).await;
});

tokio::task::yield_now().await;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this call should be explicitly awaited in tests without spawning (and can probably be done in a one-liner) because there's nothing we actually wait for having mocked the provider - so there shouldn't be a need for a task.

Comment on lines +16 to +22
pub struct SharedStore<K, W = WatcherFactory<K>>
where
K: 'static + Resource + Clone + DeserializeOwned + Debug + Send,
K::DynamicType: Hash + Eq,
W: CreateWatcher<K>,
{
watcher_provider: W,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, yeah, it's probably not that bad, I would also ideally have the damage confined to the tests if possible. was thinking there might be some ways to mock just one of the functions (like setup_reflector) used in this module, but maybe it's only mocktopus that lets you do functions (and that's nightly only).

i think it probably is OK. mocking watch streams is hard to do raw. i would otherwise suggest the mocked client with tower_test as is used in client/api.

Comment on lines +301 to +304
#[derive(Clone, Debug)]
struct TestContext {
reconciled: Arc<Mutex<bool>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestContext is unused

Comment on lines +360 to +366
stream::unfold(events, |mut events| async move {
match events.pop_front() {
Some(event) => Some((Ok(event), events)),
// if there's nothing left we block to simulate waiting for a change
None => futures::future::pending().await,
}
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can just convert the vec to a stream here and return that without doing an unfold

Comment on lines +370 to +386
enum TestError {
TestError,
}

impl Debug for TestError {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
unimplemented!()
}
}

impl Display for TestError {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
unimplemented!()
}
}

impl std::error::Error for TestError {}
Copy link
Member

@clux clux Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestError also unused ^
EDIT: also ClonedState

@clux
Copy link
Member

clux commented Feb 24, 2023

I think the basic idea of this struct is good, but it's harder to see where to go optimally here without seeing how you hook it up to a controller (because it doesn't help much to generate stores up front when Controller ultimately creates their own stores).

The potential setup you have here can work, but I can also see it having potential as a more global stream manager that maintains watcher streams for all types regardless of kind. The latter might be an easier thing to integrate into Controller as an endgame for #1080. At any rate, would be interested in hearing your thoughts on integration here.

@danrspencer danrspencer reopened this Feb 24, 2023
@danrspencer
Copy link
Contributor Author

danrspencer commented Feb 24, 2023

I think the basic idea of this struct is good, but it's harder to see where to go optimally here without seeing how you hook it up to a controller (because it doesn't help much to generate stores up front when Controller ultimately creates their own stores).

I haven't had chance to play around with this yet, but the idea in my head is something along the lines of:

let lp = ListParams::Default();
// I'm still in two minds if the SharedStore functions should just always return a store and a stream
// and you just drop one if you don't care about it
let store = deploy_shared_store.all(lp);
let stream = shared_store.subscribe_all(lp);

let pod_stream = pod_shared_store.subscribe_all(lp);

Controller::new_from_store(store, stream)
     .watches_stream(pod_stream, |pod| {
            Some(ObjectRef::new(&pod.name_unchecked()))
     });

The potential setup you have here can work, but I can also see it having potential as a more global stream manager that maintains watcher streams for all types regardless of kind. The latter might be an easier thing to integrate into Controller as an endgame for #1080. At any rate, would be interested in hearing your thoughts on integration here.

I did have a go at creating an "all types" version of this, but I couldn't think of a way of creating it without losing compile time type safety. The only way I can think of would be to change the reflectors property to be indexed by something like (ApiResource, Option<String>, ListParams) then strongly typing them on the way out to the user. That feels horrible to me, but I guess the streams will always come into the Rust untyped and must be cast to a known type at some point.

@danrspencer
Copy link
Contributor Author

Just wanted to drive by and say this isn't forgotten about, I just haven't had time to work on it recently. Hopefully I'll be able to bring some focus back to it soon.

@clux
Copy link
Member

clux commented Mar 30, 2023

Appreciate it! We are early-stage exploring the other side of the coin here with the controller interface. See #1163 for some context.

@clux
Copy link
Member

clux commented Jun 7, 2023

Hey @danrspencer , I know this might not be on your radar anymore, just wondering what you ended up doing or if you still have interest in this.

It is certain that we still need at least two more things to close out #1080;

  • waiting for reflectors to fully init before starting the controller
  • hooking up your streamsubscribe (that has made it in under unstable in your other PR) through some arc-wrapping

if you have solutions elsewhere it'd be good to know about, and i'll open other issues

@danrspencer
Copy link
Contributor Author

Hey,

Yep it’s still on my radar. The work I was doing that I could consider controller improvements part of got punted down the backlog but we should be picking it up again soon. We’ve got at least a couple of concrete cases in our codebase where not having shared stores is causing race condition bugs so I should be able to find some time to get back to this as part of solving those.

@danrspencer
Copy link
Contributor Author

@clux I'm going to close this PR. I'm moving jobs and realistically I'm probably not going to get chance to pick this up again. Thanks for your help and feedback getting it this far. Hopefully some of the ideas and discussions will help solve the problem.

@danrspencer danrspencer closed this Sep 8, 2023
@clux
Copy link
Member

clux commented Sep 8, 2023

Ok 👍

Thanks for your efforts and best of luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants