Support regex processing of DB role names in collector #194

joehorsnell · 2021-08-13T11:19:13Z

As per our call earlier this week (cc @robharrop), I'm opening an issue for your comment/approval on our proposed collector change, to allow processing/transformation of DB role names in the collector, before sending to the pganalyze server.

Our specific use case is to handle Vault Dynamic Secrets, as the dynamically generated suffixes that Vault adds to the DB roles it creates cause pganalyze to treat statements that we would like to be grouped together as separate, so there is effectively no useful history for each statement. The proposed solution (using regex) supports our requirement, but I believe would also be generic enough to be useful for other contexts, even just literal string replacements.

Detail

The DB role names generated by Vault Dynamic Secrets have two dynamically generated suffixes that we would like to trim and ignore.

For example, for role v-aws-some-service-3tce39Yn1P2lV7RSQnMN-1628677987, we want to ignore the two suffix groups, and just send v-aws-some-service to pganalyze.

The proposed solution is to add some collector config options that accept a Sed-like search/replace pattern, eg.

echo "v-aws-some-service-3tce39Yn1P2lV7RSQnMN-1628677987" | gsed --regexp-extended 's/-[[:alnum:]]+-[[:digit:]]+$//'
v-aws-some-service

It seems the built-in Go regex package supports the required regex syntax.

If you are OK in principle with this idea, we can look at fleshing out a bit more detail on this, eg a simple design (exactly what config options to provide and how), so please let us know. Also could you please confirm whether there's any server-side change that would be required to support this, or if it could be handled completely within the collector?

Thanks,

Joe.

The text was updated successfully, but these errors were encountered:

msakrejda · 2021-09-02T17:54:33Z

Hi Joe,

As Lukas mentioned offline, this makes sense in general. In some ways this is similar to #62 . And it would not require any server-side changes assuming that the "folding" happens consistently everywhere we reference roles/users in the collector.

In terms of changes this would require, we (mostly) do not track object ownership, so the main places that would have to change are:

collecting pg_stat_statements output: https://github.com/pganalyze/collector/blob/main/input/postgres/statements.go#L24
log parsing: https://github.com/pganalyze/collector/blob/main/logs/parse.go#L373-L375
database ownership, tablespace ownership, backend role, standby info role, vacuum info role

Does the "canonical" role (without the suffixes) actually exist in your system? If so (or if that's something you can create without too much trouble to accommodate this functionality), I think that makes things easier, since pg_stat_statements deals with actual user_ids. If we can fold these other roles to a real role, that avoids having to manage synthetic ids (and avoid collisions with real role ids).

In terms of code changes, the pg_stat_statements case will either require the match to be pushed down into the query and a join with pg_roles (to map the pg_stat_statements userid to the canonical role), or to add a separate id mapping step after the main query if the role processing regexp is set. I think either approach can work. If we go for the join, we could use a left join and coalesce to the existing pg_stat_statements.userid to short-circuit the join if the regexp is not set.

The log parsing case is probably simpler: the log parsing code in general is a little gnarly due to different prefix handling, but the user name could be massaged in the one place I linked for all of them, I think.

I have not looked at the other role associations (the third bullet point), but I think this should be similar to what we would need to do for statements handling.

Does this make sense?

/cc @seanlinsley if you have any thoughts

joehorsnell · 2021-09-07T10:39:42Z

Hi @uhoh-itsmaciek, thanks for your response and additional info on where the changes would be needed.

Does the "canonical" role (without the suffixes) actually exist in your system?

That's a good question. As things stand, no - the default username template used by Vault for dynamic roles truncates the role to 8 chars in the generated role, although that should be pretty simple for us to change - we'll investigate.

Do you think it makes sense to do an initial refactoring of the collector to introduce a type to represent the role, rather than just a string, so that operations on it (eg to sanitise it to the canonical role) can be centralised? I'm not too familiar with the collector code to know whether that would be too big a task and/or worth it.

Thanks.

msakrejda · 2021-09-07T15:50:19Z

Regarding the refactoring, my gut feeling is that if we go with the separate mapping step, that's probably easier as a shared helper function rather than a first-class type for roles. It could take a database connection and a list of role OIDs, and return the normalized OID. I'm certainly open to other options if you see a better approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support regex processing of DB role names in collector #194

Support regex processing of DB role names in collector #194

joehorsnell commented Aug 13, 2021

msakrejda commented Sep 2, 2021

joehorsnell commented Sep 7, 2021

msakrejda commented Sep 7, 2021

Support regex processing of DB role names in collector #194

Support regex processing of DB role names in collector #194

Comments

joehorsnell commented Aug 13, 2021

Detail

msakrejda commented Sep 2, 2021

joehorsnell commented Sep 7, 2021

msakrejda commented Sep 7, 2021