Skip to content

Some optimizations and fixes #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Some optimizations and fixes #102

wants to merge 8 commits into from

Conversation

kolya7k
Copy link

@kolya7k kolya7k commented Feb 2, 2025

  1. Row Constructor Expression eventually not used even by MySQL 8.1+ and so (col_a, col_b) > (a, b) leads to INDEX scan instead of RANGE scan when col_a > a OR (col_a = a AND col_b > b) used.
    Speedup for queries about 20x in huge tables

  2. We have to convert date as string ('2025-02-02') to date type because python can't compare str with date

max_primary_key = max(max_primary_key, record_primary_key)

last_record = records[-1]
max_primary_key = [self.to_date_if_str(last_record[key_idx]) for key_idx in primary_key_ids]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only attempt to convert for datetime or date types, not for arbitrary strings. Please do the following:

  1. Get MySQL field types for primary keys (add it after line 300):

    field_types_mysql = [field.field_type for field in mysql_table_structure.fields]
    primary_key_types_mysql = [field_types[key_idx] for key_idx in primary_key_ids]
    
  2. Check if the type is datetime and only in this case try to convert it to dates.

@bakwc
Copy link
Owner

bakwc commented Feb 2, 2025

It would be great to add some unit tests if you have time. See running tests section in README.md (it's optional)

@bakwc
Copy link
Owner

bakwc commented Feb 3, 2025

Please note that some existing tests seem to be failing; could you please take a look?

@kolya7k
Copy link
Author

kolya7k commented Feb 3, 2025

I'm coding on python first time and on vacation atm :)
Test failure was fixed in last commits.

@bakwc
Copy link
Owner

bakwc commented Feb 3, 2025

They still failing.. here is a stack trace:

 Traceback (most recent call last):
  File "/app/mysql_ch_replicator/db_replicator.py", line 210, in run
    self.perform_initial_replication()
  File "/app/mysql_ch_replicator/db_replicator.py", line 254, in perform_initial_replication
    self.perform_initial_replication_table(table)
  File "/app/mysql_ch_replicator/db_replicator.py", line 358, in perform_initial_replication_table
    self.clickhouse_api.insert(table_name, records, table_structure=clickhouse_table_structure)
  File "/app/mysql_ch_replicator/clickhouse_api.py", line 244, in insert
    records=len(records_to_insert),
                ^^^^^^^^^^^^^^^^^
NameError: name 'records_to_insert' is not defined

@kolya7k
Copy link
Author

kolya7k commented Feb 6, 2025

Forgot the last commit, did it

@bakwc
Copy link
Owner

bakwc commented Feb 6, 2025

Tests still failing (this time another test), see the build run, test is called test_datetime_exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/clickhouse_connect/driver/transform.py", line 99, in chunk_gen
    col_type.write_column(data, output, context)
  File "/usr/local/lib/python3.12/site-packages/clickhouse_connect/datatypes/base.py", line 214, in write_column
    self.write_column_data(column, dest, ctx)
  File "/usr/local/lib/python3.12/site-packages/clickhouse_connect/datatypes/base.py", line 229, in write_column_data
    self._write_column_binary(column, dest, ctx)
  File "/usr/local/lib/python3.12/site-packages/clickhouse_connect/datatypes/temporal.py", line 223, in _write_column_binary
    column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 for x in column]

Anyway thanks a lot for the PR! Will merge it as soon as all test resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants