Description
Related dev. issue(s): tarantool/tarantool#10766, https://jira.vk.team/browse/TNTP-2331
Product: Tarantool
Since: This is a bug, all current Tarantool versions are affected, including 2.11, 3.x
Root document: https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_space/create_index/, https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_index/alter/, https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_space/format/, https://www.tarantool.io/en/doc/latest/platform/ddl_dml/migrations/space_upgrade/
SME: @ sergepetrenko
Details
A serious design bug was discovered in tarantool non-blocking ddl procedure (such as index build or rebuild and space format change requiring space data validation). While these actions are "non-blocking" from the master's point of view (the actions are performed in background and do not block parallel data modification and access), they actually block all replication, both synchronous and asynchronous.
This happens because replicas apply master's operations in order, one by one, and hence do not apply any further operations until an index is fully built or space format is fully validated (these actions are taken as part of operation processing).
This means that every time a user issues such non-blocking ddl operation, all the replication is actually blocked. When the user has asynchronous replication, this can be noticed by a significant upstream and downstream lag growth during the whole ddl operation appliance. Essentially, if some index build lasts 10 minutes, by the time it's finished, replicas will be 10 minutes behind master by means of data and will have to catch up.
When synchronous replication is used, the problem is exacerbated further, because replicas do not ack synchronous transactions to master in time (due to lag growth), and master cannot commit any transaction during the whole index build process. So building an index results in downtime for writes for the same amount of time (may easily take minutes or even tens of minutes on large enough spaces).
The fix will be partial and will be only implemented in Tarantool 4.x, due to severe breaking changes imposed by it. So now we have to warn all users of Tarantool 2.11 and 3.x about such issues:
- We should discourage building/rebuilding indexes on large spaces when the cluster is under load.
- We may add a warning, that index rebuilding will be banned in Tarantool 4.x. Instead the users will have to create a new index with desired properties and drop an old one.
- We should notify the users that changing space format (when it requires rechecking all space contents) shouldn't be used also, and that
space.upgrade()
is a safe alternative to plainspace:format()
.