Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semi/automatic escaping for reserved words as tables or columns #53

Closed
NPCRUS opened this issue Dec 26, 2024 · 7 comments · Fixed by #73
Closed

Semi/automatic escaping for reserved words as tables or columns #53

NPCRUS opened this issue Dec 26, 2024 · 7 comments · Fixed by #73

Comments

@NPCRUS
Copy link
Contributor

NPCRUS commented Dec 26, 2024

I have a table called order, which is a reserved word in sql. So the query compiles into something like:

SELECT * FROM order

the desired query(Postgres) would be:

SELECT * from "order"

Current workaround is to set explicit schema name on table like this:
override def schemaName: String = "schemaName"

Obviously the same would apply for columns, so we need mechanism to escape column and table names

@NPCRUS
Copy link
Contributor Author

NPCRUS commented Jan 7, 2025

So my thoughts on this issue:

  1. seems like identifier delimit char is ultimately a " at least in currently supported databases + Microsoft SQL server, if another char for a different db appears, we could do it as a part of Dialect I think, so for now we can hardcode it
  2. what needs to be delimited is reference to tables, columns and schemas(when applied explicitly in query)
  3. it's seems that it's recommended pretty much to 100% escape or not escape at all table/column identifier
  4. there are certain quirks when escaping table/column identifiers we need to keep in mind, eg: in sqlite table names are normally capitalized: USERS, and when you not escape them, sqlite will translate unescaped users -> USERS, however when you escape the table, sqlite will not perform this trick anymore, so it will be "users" -> "users" and therefore table will not be found, so in this case we will need to do capitalizing ourselves or maybe leave it to Config.tableNameMapper. There can be more stuff like this

In the end, I plan to use existing Table.resolve and make the same for Column, lots of resulting sql will be changed. Probably quite tedious job, but I need it in my project, so there's no other way

@lihaoyi please let me know if I forgot anything or maybe if you see it differently, otherwise I would like to start with implementation

lihaoyi pushed a commit that referenced this issue Jan 8, 2025
Hello folks, here I tried to add support for `schemaName` into `INSERT`,
`UPDATE`, `DELETE` and some sub variants of those. I feel 90% confused
about most of the stuff I see in the repo so far, but I wanted to start
somewhere, so I can gradually get a better understanding of what's going
on.

Here I implemented a `Table.resolve` function that should make a fully
qualified table name + schema if present + apply mapping from config,
maybe it can also be used further for #53
I also thought it's a good idea whenever `Table.Base` becomes `String`
to be used as a fully qualified name to not do any further processing
and mapping of this string
 Fixes #54

---------

Co-authored-by: nikitaglushchenko <[email protected]>
@lihaoyi
Copy link
Member

lihaoyi commented Jan 28, 2025

@NPCRUS is this fixed by #57?

@NPCRUS
Copy link
Contributor Author

NPCRUS commented Jan 28, 2025

@lihaoyi nope, in that one custom schema were applied to different query parts, this one is about escaping

@NPCRUS
Copy link
Contributor Author

NPCRUS commented Jan 28, 2025

@lihaoyi if the way i describe it should be fixed doesn't contradict any of your ideas for this lib - let me know and i will give it a go

@lihaoyi
Copy link
Member

lihaoyi commented Jan 30, 2025

@NPCRUS I think what you say sounds reasonable. Feel free to open a PR and we can see what others think

@lihaoyi
Copy link
Member

lihaoyi commented Jan 30, 2025

Maybe we could use a blacklist of reserved words, to avoid escaping in the common case? That would minimize the amount of changed SQL

@ckipp01
Copy link

ckipp01 commented Feb 6, 2025

Current workaround is to set explicit schema name on table like this:

override def schemaName: String = "schemaName"

One thing to note with this workaround because I just hit on it, this works fine for SELECTs, but if you attempt to do an update this seems to not work. For example it will create sql that looks like this:

UPDATE public.user SET something = ? WHERE (user.someid = ?)

where the first use of user is created with public.user so there is no issue, but it will fail on the WHERE clause since it generates user.whatever instead of public.user.whatever.

lihaoyi pushed a commit that referenced this issue Mar 27, 2025
This PR here is somewhat a beginning of a conversation about how do we
want to make proper sql identifiers when it comes to schemas, tables and
columns, particularly for my use case I want this in order to use sql
reserved words for table names. I'll try to split this into 2 logical
parts:

**How do we know how to escape depending on db variant?**
A SQL 1999 rfc says that double quotes should be used in order to escape
an identifier, nevertheless:
- in mysql it is backticks
- in h2 when you escape you suddenly loose the ability to refer to a
table ignoring it's definition case: defining a table `T`, `SELECT *
from t` will work, `SELECT * from "t"` will not

Therefore it felt logical to make an escaping mechanism based on
`Dialect`. I didn't want to propagate `DialectConfig` together with
`Context` down the call chain, it seemed to me kinda redundant, so I
made `DialectConfig` part of `Context`. That's how we can access
escaping mechanism in various places as long as you have `Context`

**How do we decide which table to escape?**
With this PR I want to stick with tables, otherwise it can become too
big. I see three options:
1. escape all tables
2. escape based on list of reserved words per database variant
3. let user decide to escape

First options we already discarded as one that will change too much
tests. As for a second option - I looked into lists of reserved words,
and they are very different based on database, also it will not be fun
to maintain them, that's why most of the libs resolve to escape all
tables. Third option is most simple for now, since it will allow users
to opt-in when they want/bump into this issue, also it will allow us to
test mechanism first without changing all 1k+ tests, and perhaps we can
start with option 3 and then eventually move to option 1. Therefore in
this draft PR I made a flag in `Table` to opt in for escaping table
name.

_Optional_:
At the moment `DialectConfig` is kinda part of the `Dialect`, and I
don't feel entirely good about including whole `Dialect` in `Context`. I
kinda want to move `DialectConfig` into case class and make a separate
implicit for it as a part of `Dialect` so that when we need to construct
`Context` only `DialectConfig` case class will be picked up

It is a draft PR yet, so obviously I haven't written tests for all
cases, only for a simple `FROM` expression. This will follow when we
come to consensus about the implementation.
What do you guys think?
fixes #53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants