-
Notifications
You must be signed in to change notification settings - Fork 58
Add a user provision type to silo user and group #8981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add a user provision type to silo user and group #8981
Conversation
Silo users and groups have traditionally defined "external id" as whatever the silo's identity provider (or in the case of ApiOnly silos, an operator) assigns for them: this could be username, email, a random id, etc. SCIM throws one of many wrenches at us because the specification defines an external id attribute for users and groups that is distinct from other related attributes that could be sent, including username and email. Importantly the SCIM client that is creating a user or group could send a NULL external ID, which is not supported by the current field's database type. The first step to supporting SCIM's extra attributes is to add the silo's user provision field to our user and group models, and use this field to change the semantics of the external id field depending on what provision type is used. For the existing ApiOnly and JIT silos this will stay the same. The second step that will help with type safety is to use this new user provision type field to separate the silo user and group database models into distinct silo user and group types at the datastore level, and have separate enum variants for each type that wrap an inner typed object: ```rust pub enum SiloUser { /// A User created via the API ApiOnly(SiloUserApiOnly), /// A User created during an authenticated SAML login Jit(SiloUserJit), } ``` Call sites now have to use either the ApiOnly or Jit variant of this new higher level silo user type. With the addition of user provision type to the silo user model, Nexus can also now check that the provision type on the user matches the silo, or (like with some of the local idp functions) accept only one specific variant. The third step that will also help with type safety is to have typed lookups for silo users and groups, as different user provision types will have different fields that they enforce uniqueness for: for example, only search for ApiOnly and JIT users using external_id, but for SCIM use username. One possible bug that this commit addresses is that the creation of the recovery silo did not have a check to ensure that the created silo was a "LocalOnly" identity type. The code prior to this commit could create a SamlJit silo, which is not ideal for a recovery silo! Another possible bug potential addressed is a mismatch in the silo's user provision type, and the users within that silo. This can be enforced via functions that only accept one variant of the new higher level enum, and is checked at the lower layers. Another bug addressed: in `silo_group_ensure` there was a small window between `silo_group_ensure_query` and unwrapping the result of `silo_group_optional_lookup` where, if it were possible to delete groups, a racing delete would cause the unwrap to panic. Also! Fill in `test_ensure_same_silo_group`, why that was left truncated is beyond me, but don't look at the git blame please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a pass through this and found some nits. It might be worth having a second set of eyes on the diesel bits from someone who touches crdb more often than I do.
pub struct SiloGroup { | ||
#[diesel(embed)] | ||
identity: SiloGroupIdentity, | ||
pub identity: SiloGroupIdentity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be fully pub or can we make them pub(crate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, these need to be fully pub as the DataStore structs need to access those fields.
match &recovery_silo.identity_mode { | ||
shared::SiloIdentityMode::LocalOnly => { | ||
// Ok | ||
} | ||
|
||
shared::SiloIdentityMode::SamlJit => { | ||
return Err(RackInitError::Silo(Error::invalid_request( | ||
"recovery silo should only use identity mode LocalOnly", | ||
))); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this match statement will grow with the addition of SCIM we can probably distill the logic down to just:
match &recovery_silo.identity_mode { | |
shared::SiloIdentityMode::LocalOnly => { | |
// Ok | |
} | |
shared::SiloIdentityMode::SamlJit => { | |
return Err(RackInitError::Silo(Error::invalid_request( | |
"recovery silo should only use identity mode LocalOnly", | |
))); | |
} | |
} | |
if !matches!( | |
&recovery_silo.identity_mode, | |
shared::SiloIdentityMode::LocalOnly | |
) { | |
return Err(RackInitError::Silo(Error::invalid_request( | |
"recovery silo should only use identity mode LocalOnly", | |
))); | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done in e5b315b
silo_id: record.silo_id, | ||
// SAFETY: there is a database constraint that prevents a group | ||
// with provision type 'api_only' from having a null external id | ||
external_id: record.external_id.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this .expect("database constraint exists")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5690907
silo_id: record.silo_id, | ||
// SAFETY: there is a database constraint that prevents a group | ||
// with provision type 'jit' from having a null external id | ||
external_id: record.external_id.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this .expect("database constraint exists")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5690907
// This function is _not_ transactional, meaning a delete could race | ||
// between the ensure query and the lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay if the delete occurs after the lookup and we return silo_group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok - it's unlikely in the first place as the lookup occurs by ID, but it's better not to panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the thing that would be deleted, if that happened? is it the silo group we just created, or the silo itself? because i think "not found" is definitely correct if the whole silo was being torn down...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this function it's the silo group. I've opened #9145. I think there's a bunch more work to do here related to silo user and group requests.
silo_id: record.silo_id, | ||
// SAFETY: there is a database constraint that prevents a user | ||
// with provision type 'api_only' from having a null external id | ||
external_id: record.external_id.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above with the expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5690907
silo_id: record.silo_id, | ||
// SAFETY: there is a database constraint that prevents a user | ||
// with provision type 'jit' from having a null external id | ||
external_id: record.external_id.unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above with the expect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5690907
CHECK ( | ||
CASE user_provision_type | ||
WHEN 'api_only' THEN external_id IS NOT NULL | ||
WHEN 'jit' THEN external_id IS NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future SCIM will allow this to be NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right :)
Build on oxidecomputer#8981 and add the SCIM user provision type to users and groups. Users must have a user name, and optionally have an external id field and/or an active field. Groups must have a display name and optionally have an external id field. Also add a new SiloIdentityMode named SamlScim. A few functions have to change to account for these new variants, but not many. Silos using the SamlScim identity mode will use all the same SAML code that exists today.
Discussed briefly in our chat, is it worth exposing the external ID on the |
I'd say not useful at this point. For these user types, external id matches the user name, and for SCIM user types, external id is the provisioning client's unique ID. |
|
||
return Err(Error::invalid_request( | ||
"user provision type of silo does not match silo group", | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be helpful to say the two types here like in the above error log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, is the user expected to know what "user provision type" means? given that this isn't an internal error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
87d5a66 changes this to an internal error and adds the two types
nexus/db-model/src/silo_group.rs
Outdated
/// the identity provider's ID for this group. There is a database | ||
/// constraint that this field must be non-null for those provision types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicky, feel free to ignore: it might be worth saying the name of the constraint here so that the reader can search for it if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no I like this, added in 1aee591
pub fn new_api_only_group( | ||
id: SiloGroupUuid, | ||
silo_id: Uuid, | ||
external_id: String, | ||
) -> Self { | ||
Self { | ||
identity: SiloGroupIdentity::new(id), | ||
time_deleted: None, | ||
silo_id, | ||
user_provision_type: UserProvisionType::ApiOnly, | ||
external_id: Some(external_id), | ||
} | ||
} | ||
|
||
pub fn new_jit_group( | ||
id: SiloGroupUuid, | ||
silo_id: Uuid, | ||
external_id: String, | ||
) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why two separate constructors instead of taking UserProvisionType
as an argument? is that because we're eventually going to have a SCIM UserProvisionType
that will be constructed differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right yeah, and specifically the new_scim_group
method will require different fields to be non-null (and additional fields that could be null)
let silo_admin_group = | ||
match new_silo_params.identity_mode.user_provision_type() { | ||
shared::UserProvisionType::ApiOnly => { | ||
db::model::SiloGroup::new_api_only_group( | ||
silo_group_id, | ||
silo_id, | ||
admin_group_name.clone(), | ||
) | ||
} | ||
|
||
shared::UserProvisionType::Jit => { | ||
db::model::SiloGroup::new_jit_group( | ||
silo_group_id, | ||
silo_id, | ||
admin_group_name.clone(), | ||
) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sure seems like it would be a lot simpler if we just had one constructor that you pass the user_provision_type
into, but i'm guessing a subsequent change will make it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's additional behaviour changes required in this function (and others) based on the user provision type. Hopefully the additional "add the scim user provision type" PR will make this make more sense, I just wanted to break out the changes in the smallest parts that made sense (maybe this one was too small!).
// SAFETY: there is a database constraint that prevents a group | ||
// with provision type 'api_only' from having a null external id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would really prefer to follow the conventions that all-caps // SAFETY:
comments are specifically for unsafe
blocks and not for other correctness-critical things (you could reasonably argue that, if a // SAFETY:
comment is for "anything that you need to know for the program to do the right thing", then all comments should have SAFETY in them...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good - addressed this in 1aee591
// SAFETY: there is a database constraint that prevents a group | ||
// with provision type 'jit' from having a null external id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed this also in 1aee591
fn from(u: SiloGroupApiOnly) -> views::Group { | ||
views::Group { | ||
id: u.id, | ||
// TODO the use of external_id as display_name is temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something that will be done in this PR or subsequently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah this was copied over from the existing From
, I'm not sure the long term plan here.
|
||
return Err(Error::invalid_request( | ||
"user provision type of silo does not match silo group", | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, is the user expected to know what "user provision type" means? given that this isn't an internal error
if silo.user_provision_type != lookup_user_provision_type { | ||
return Err(Error::invalid_request(format!( | ||
"silo provision type {:?} does not match lookup {:?}", | ||
silo.user_provision_type, lookup_user_provision_type, | ||
))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above, we also log an error for this, should we do that here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a log in 87d5a66
// This function is _not_ transactional, meaning a delete could race | ||
// between the ensure query and the lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the thing that would be deleted, if that happened? is it the silo group we just created, or the silo itself? because i think "not found" is definitely correct if the whole silo was being torn down...
); | ||
|
||
return Err(Error::invalid_request( | ||
"user provision type of silo does not match silo user", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly to silo group, should we include the types here? also is a user expected to understand this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was too focused on "invalid request" that some part of the code was making to the relevant functions, and that didn't connect with the fact that this is an external Error - I've changed this to a 500 in 87d5a66
Silo users and groups have traditionally defined "external id" as whatever the silo's identity provider (or in the case of ApiOnly silos, an operator) assigns for them: this could be username, email, a random id, etc.
SCIM throws one of many wrenches at us because the specification defines an external id attribute for users and groups that is distinct from other related attributes that could be sent, including username and email. Importantly the SCIM client that is creating a user or group could send a NULL external ID, which is not supported by the current field's database type.
The first step to supporting SCIM's extra attributes is to add the silo's user provision field to our user and group models, and use this field to change the semantics of the external id field depending on what provision type is used. For the existing ApiOnly and JIT silos this will stay the same.
The second step that will help with type safety is to use this new user provision type field to separate the silo user and group database models into distinct silo user and group types at the datastore level, and have separate enum variants for each type that wrap an inner typed object:
Call sites now have to use either the ApiOnly or Jit variant of this new higher level silo user type. With the addition of user provision type to the silo user model, Nexus can also now check that the provision type on the user matches the silo, or (like with some of the local idp functions) accept only one specific variant.
The third step that will also help with type safety is to have typed lookups for silo users and groups, as different user provision types will have different fields that they enforce uniqueness for: for example, only search for ApiOnly and JIT users using external_id, but for SCIM use username.
One possible bug that this commit addresses is that the creation of the recovery silo did not have a check to ensure that the created silo was a "LocalOnly" identity type. The code prior to this commit could create a SamlJit silo, which is not ideal for a recovery silo!
Another possible bug potential addressed is a mismatch in the silo's user provision type, and the users within that silo. This can be enforced via functions that only accept one variant of the new higher level enum, and is checked at the lower layers.
Another bug addressed: in
silo_group_ensure
there was a small window betweensilo_group_ensure_query
and unwrapping the result ofsilo_group_optional_lookup
where, if it were possible to delete groups, a racing delete would cause the unwrap to panic.Also! Fill in
test_ensure_same_silo_group
, why that was left truncated is beyond me, but don't look at the git blame please.