Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No fast-track for bulk inserts in Cursor.executemany with INSERT/REPLACE syntax introduced in MySQL 8.0.19 #968

Open
1 task done
hunyadi opened this issue Sep 24, 2023 · 3 comments
Labels

Comments

@hunyadi
Copy link

hunyadi commented Sep 24, 2023

Describe the bug

When calling Cursor.executemany with an INSERT or REPLACE SQL statement, aiomysql compares the statement against a regular expression RE_INSERT_VALUES and if there is a match, a different, fast-track execution path is taken as opposed to expanding executemany into a series of execute statements.

Unfortunately, the regular expression tests against the older INSERT or REPLACE syntax in MySQL versions prior to 8.0.19 only. For example, the following is a match:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6)
  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

This causes a warning message to be emitted with MySQL 8.0.20 and later:

/usr/local/lib/python3.11/site-packages/aiomysql/cursors.py:239: Warning: 'VALUES function' is deprecated and will be removed in a future release. Please use an alias (INSERT INTO ... VALUES (...) AS alias) and replace VALUES(col) in the ON DUPLICATE KEY UPDATE clause with alias.col instead

However, if the new recommended syntax is adopted, the fast-track course is not longer chosen, and execution significantly slows down. This is because the new syntax is no longer a match for RE_INSERT_VALUES:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6) AS new(m,n,p)
  ON DUPLICATE KEY UPDATE c = m+n;

On the contrary, execution speed is restored if we slightly modify the regular expression that SQL statements are tested against:

RE_INSERT_VALUES = re.compile(
    r"\s*((?:INSERT|REPLACE)\s.+\sVALUES?\s+)"
    + r"(\(\s*(?:%s|%\(.+\)s)\s*(?:,\s*(?:%s|%\(.+\)s)\s*)*\))"
    + r"(\s*(?:(?:AS|ON DUPLICATE).*)?);?\s*\Z",
    re.IGNORECASE | re.DOTALL,
)

This will make both old-style and new-style syntax pass.

To Reproduce

Try a bulk INSERT statement with the old syntax:

INSERT INTO "DataTable"
("id", "data") VALUES (%s, %s)
ON DUPLICATE KEY UPDATE
"data" = VALUES("data")

A warning message is emitted in MySQL 8.0.20 and later.

Try a bulk INSERT statement with the new syntax (MySQL 8.0.19 and later):

INSERT INTO "DataTable"
("id", "data") VALUES (%s, %s) AS EXCLUDED
ON DUPLICATE KEY UPDATE
"data" = EXCLUDED."data"

Execution significantly slows down.

Expected behavior

Execution speed does not diminish when using the new MySQL 8.0.20 syntax for INSERT.

Logs/tracebacks

n/a

Python Version

Python 3.11.5

aiomysql Version

Name: aiomysql
Version: 0.2.0
Summary: MySQL driver for asyncio.
Home-page: https://github.com/aio-libs/aiomysql
Author: Nikolay Novik
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.11/site-packages
Requires: PyMySQL
Required-by:

PyMySQL Version

Name: PyMySQL
Version: 1.1.0
Summary: Pure Python MySQL Driver
Home-page: 
Author: 
Author-email: Inada Naoki <[email protected]>, Yutaka Matsubara <[email protected]>
License: MIT License
Location: /usr/local/lib/python3.11/site-packages
Requires: 
Required-by: aiomysql

SQLAlchemy Version

No response

OS

macOS 13.5.2

Database type and version

MySQL 8.1.0

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@hunyadi hunyadi added the bug label Sep 24, 2023
@kozchris
Copy link

kozchris commented Jan 24, 2025

I am seeing this issue in aiomysql = "0.2.0".

From my test app:

DEBUG:root:On deprecated named col duplicate insert of 8000 rows took 0.2305469512939453 seconds
DEBUG:root:On named col duplicate insert of 8000 rows took 24.389080286026 seconds
DEBUG:root:On placeholder duplicate insert of 8000 rows took 1.0601818561553955 seconds

I'll try and aattach a sample program demonstrating the issue. For ease, add a .env file with the following information for your db and then run the dbtest2.py file:

.env

DATABASE_USER=xxx
DATABASE_PASSWORD=yyy
DATABASE_HOST=aaa
DATABASE_NAME=zzzz

db_test2.py.zip

@kozchris
Copy link

Additional note, in my example, changing the syntax from something like:

INSERT INTO my_table (c1, c2)
VALUES (VALUES (%(c1)s,VALUES (%(c2)s)
AS new 
ON DUPLICATE KEY UPDATE
c2=new.c2

and using named values as the input to something like:

INSERT INTO my_table (c1, c2)
VALUES (%s, %s )
AS new 
ON DUPLICATE KEY UPDATE
c2=new.c2

and using rows as input, the query performed almost as fast as the deprecated syntax. However, when I tried to create a generic table to demonstrate the phenomenon, I couldn't. So something is also special with the example table structure. That could be related to the underlying issue, or could be another issue.

@kozchris
Copy link

Another note, the mysql-connector-python lib handles the syntax with no problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants