Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible unexpected behavior in bdk_chain::migrate_schema #1767

Closed
nymius opened this issue Dec 9, 2024 · 5 comments
Closed

Possible unexpected behavior in bdk_chain::migrate_schema #1767

nymius opened this issue Dec 9, 2024 · 5 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@nymius
Copy link
Contributor

nymius commented Dec 9, 2024

Describe the bug

migrate_schema skips the first n schemas to apply where nis the current schema version. That means, if I want to independently apply a new schema k+1 to a database already using schema k I should create a list of k+1 schemas (where any string, even empty ones, could be an schema) as migrate_schema will skip the k first anyway.
If I don't do that, skip will reach the end of iterator sooner and return another empty iterator, forcing migrate_schema to do nothing.

So:

migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[schema_v0])?;

applies schema_v0.

But

migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[schema_v1])?;

applied in a schema_v0 database does nothing.

And

migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[schema_v0, schema_v1])?;

applied in a schema_v0 database applies schema_v1.

As there is a workaround for this, I don't think it is a great deal. But I think we should at least document it, if not changed.

To Reproduce

#[test]
fn schemas_can_be_applied_independently() -> anyhow::Result<()> {
    type ChangeSet = tx_graph::ChangeSet<ConfirmationBlockTime>;
    use alloc::string::String;
    let schema_v0 = "CREATE TABLE test_table_0 (id TEXT PRIMARY KEY NOT NULL) STRICT";
    let schema_v1 = "CREATE TABLE test_table_1 (id TEXT PRIMARY KEY NOT NULL) STRICT";
    let mut conn = rusqlite::Connection::open_in_memory()?;

    // Apply schema_v0
    {
        let db_tx = conn.transaction()?;
        migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&schema_v0])?;
        db_tx.commit()?;
    }

    assert_eq!(
        conn.query_row(
            "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_0';",
            [],
            |row| row.get::<_, String>(0),
        )?,
        "test_table_0"
    );

    // Apply schema_v1 only
    {
        let db_tx = conn.transaction()?;
        migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&schema_v1])?;
        db_tx.commit()?;
    }

    assert_eq!(
        conn.query_row(
            "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_1';",
            [],
            |row| row.get::<_, String>(0),
        ),
        Err(rusqlite::Error::QueryReturnedNoRows)
    );

    // Apply "dummy" and then schema_v1
    {
        let db_tx = conn.transaction()?;
        migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&"dummy", &schema_v1])?;
        db_tx.commit()?;
    }

    assert_eq!(
        conn.query_row(
            "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_1';",
            [],
            |row| row.get::<_, String>(0),
        )?,
        "test_table_1"
    );
    Ok(())
}
git diff
diff --git a/crates/chain/src/rusqlite_impl.rs b/crates/chain/src/rusqlite_impl.rs
index 7b39f53c..d70f1d7b 100644
--- a/crates/chain/src/rusqlite_impl.rs
+++ b/crates/chain/src/rusqlite_impl.rs
@@ -715,7 +715,64 @@ mod test {
             db_tx.commit()?;
             assert!(changeset.anchors.contains(&(anchor, txid)));
         }
+        Ok(())
+    }
+
+    #[test]
+    fn schemas_can_be_applied_independently() -> anyhow::Result<()> {
+        type ChangeSet = tx_graph::ChangeSet<ConfirmationBlockTime>;
+        use alloc::string::String;
+        let schema_v0 = "CREATE TABLE test_table_0 (id TEXT PRIMARY KEY NOT NULL) STRICT";
+        let schema_v1 = "CREATE TABLE test_table_1 (id TEXT PRIMARY KEY NOT NULL) STRICT";
+        let mut conn = rusqlite::Connection::open_in_memory()?;
+
+        // Apply schema_v0
+        {
+            let db_tx = conn.transaction()?;
+            migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&schema_v0])?;
+            db_tx.commit()?;
+        }
+
+        assert_eq!(
+            conn.query_row(
+                "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_0';",
+                [],
+                |row| row.get::<_, String>(0),
+            )?,
+            "test_table_0"
+        );
 
+        // Apply schema_v1 only
+        {
+            let db_tx = conn.transaction()?;
+            migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&schema_v1])?;
+            db_tx.commit()?;
+        }
+
+        assert_eq!(
+            conn.query_row(
+                "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_1';",
+                [],
+                |row| row.get::<_, String>(0),
+            ),
+            Err(rusqlite::Error::QueryReturnedNoRows)
+        );
+
+        // Apply "dummy" and then schema_v1
+        {
+            let db_tx = conn.transaction()?;
+            migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[&"dummy", &schema_v1])?;
+            db_tx.commit()?;
+        }
+
+        assert_eq!(
+            conn.query_row(
+                "SELECT name FROM sqlite_master WHERE type='table' AND name='test_table_1';",
+                [],
+                |row| row.get::<_, String>(0),
+            )?,
+            "test_table_1"
+        );
         Ok(())
     }
 }

Expected behavior

  1. To apply schema_v5 when calling migrate_schema(..., &[schema_v5]) on a database with schema v4.
  2. To raise an error asking to apply schema_v4 first if calling migrate_schema(..., &[schema_v5]) on a database with schema v3.
  3. To do nothing when calling migrate_schema(..., &[schema_v3]) on a database with schema v4.
  4. To apply schema_v4, schema_v5 in order and ignore schema_v3 when calling migrate_schema(..., &[schema_v4, schema_v3, schema_v5]) on a database with schema v3.
@nymius nymius added the bug Something isn't working label Dec 9, 2024
@notmandatory notmandatory added this to BDK Dec 9, 2024
@notmandatory
Copy link
Member

Thanks for the great description of this issue. As long as BDK is implementing the DB migration for users I think this is a low risk issue. But still good to have it documented in case anyone wants to implement a better way to do it.

@notmandatory notmandatory moved this to Discussion in BDK Dec 9, 2024
@evanlinjin
Copy link
Member

evanlinjin commented Dec 17, 2024

migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[schema_v1])?;

But this is just wrong code right?

I don't think this ticket is a bug, but a feature request for performance reasons?

However, looking at this, I do have a new thought. If the current (persisted) schema version is greater than the defined schemas, we should probably error out. --> This is the bug.

@nymius
Copy link
Contributor Author

nymius commented Dec 18, 2024

migrate_schema(&db_tx, ChangeSet::SCHEMA_NAME, &[schema_v1])?;

But this is just wrong code right?

Understood. Maybe I'm the only one committing this error, but then, shouldn't we add to migrate_schemas's docs a note mentioning the correct way to call this function is by passing a list including the schema we want to apply and all previous ones, in order? Probably the above example of what you should not do would help too.

However, looking at this, I do have a new thought. If the current (persisted) schema version is greater than the defined schemas, we should probably error out. --> This is the bug.

But the notion of "version" for the individual schemas is given by their order in the versioned_scripts dir. It can be faked by adding empty strings until the current persisted version number is reached, and then add the schema we want to apply.
What I mean with this is incorrect behavior is still possible by relying in an array to apply the schemas, if we want to enforce just one possible behavior, I think we should add the notion of version to the scripts themselves. I don't know which kind of structure is better for this.

@notmandatory notmandatory added the documentation Improvements or additions to documentation label Dec 31, 2024
@notmandatory
Copy link
Member

Since sqlite schema migration is an internal concern and not meant to be user facing I think we only need to make the migrate_schema function non-public so it can only be called by init_sqlite_tables(). Any dev adding a new schema migration is responsible for updating init_sqlite_tables() correctly by adding their new schema to the list.

@nymius
Copy link
Contributor Author

nymius commented Jan 8, 2025

You convinced me this is not an issue. Thank you both for the feedback.
I couldn't make the function non-public because the foreign crate bdk_wallet uses it to implement their own migrations for its own ChangeSet.
I wrote draft code in relation to how to fix

If the current (persisted) schema version is greater than the defined schemas, we should probably error out.

ensuring other invariants too, like all schemas to apply are greater or equal to the current version and there are not missing schemas in between, but requires more work I think doesn't worth the issue, if at the end we are in control of the schemas.

Code diff
+#[derive(Debug)]
+pub enum MigrationError {
+    SchemaError(String),
+    SqliteError(rusqlite::Error),
+}
+
+impl core::error::Error for MigrationError {}
+
+impl From<rusqlite::Error> for MigrationError {
+    fn from(e: rusqlite::Error) -> Self {
+        MigrationError::SqliteError(e)
+    }
+}
+
+impl fmt::Display for MigrationError {
+    fn fmt(&self, f: &mut core::fmt::Formatter) -> fmt::Result {
+        use MigrationError::*;
+
+        match self {
+            SchemaError(s) => write!(f, "error while applying schema migration, {}", s),
+            SqliteError(e) => e.fmt(f),
+        }
+    }
+}
+
 /// Runs logic that initializes/migrates the table schemas.
 pub fn migrate_schema(
     db_tx: &Transaction,
     schema_name: &str,
-    versioned_scripts: &[&str],
-) -> rusqlite::Result<()> {
+    versioned_scripts: &mut [&SqliteSchema],
+) -> Result<(), MigrationError> {
     init_schemas_table(db_tx)?;
-    let current_version = schema_version(db_tx, schema_name)?;
-    let exec_from = current_version.map_or(0_usize, |v| v as usize + 1);
-    let scripts_to_exec = versioned_scripts.iter().enumerate().skip(exec_from);
-    for (version, script) in scripts_to_exec {
-        set_schema_version(db_tx, schema_name, version as u32)?;
-        db_tx.execute_batch(script)?;
+    let current_version = schema_version(db_tx, schema_name)?.map_or(0_u32, |v| v);
+    versioned_scripts.sort();
+    let max_schema = versioned_scripts.last().unwrap();
+    if max_schema.version < current_version {
+        return Err(MigrationError::SchemaError(format!(
+            "missing schema versions: {:?}",
+            ((max_schema.version + 1)..current_version)
+        )));
+    }
+    let mut agg_schema = SqliteSchema::new(current_version, String::new());
+    for script in versioned_scripts {
+        if script.version <= current_version {
+            continue;
+        }
+
+        let diff = script.version - agg_schema.version;
+
+        if diff > 1 {
+            return Err(MigrationError::SchemaError(format!(
+                "missing schema versions: {:?}",
+                ((current_version + 1)..script.version)
+            )));
+        }
+
+        agg_schema.version = script.version;
+        agg_schema.script.push_str(&script.script);
     }
+    set_schema_version(db_tx, schema_name, agg_schema.version as u32)?;
+    db_tx.execute_batch(&agg_schema.script)?;
     Ok(())
 }
 
@@ -199,6 +247,18 @@ fn to_sql_error<E: std::error::Error + Send + Sync + 'static>(err: E) -> rusqlit
     rusqlite::Error::ToSqlConversionFailure(Box::new(err))
 }
 
+#[derive(PartialEq, PartialOrd, Eq, Ord)]
+pub struct SqliteSchema {
+    version: u32,
+    script: String,
+}
+
+impl SqliteSchema {
+    fn new(version: u32, script: String) -> Self {
+        SqliteSchema { version, script }
+    }
+}
+

Closing.

@nymius nymius closed this as completed Jan 8, 2025
@github-project-automation github-project-automation bot moved this from Discussion to Done in BDK Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
Status: Done
Development

No branches or pull requests

3 participants