Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel extension: read_xlsx() gives runtime error "table index is out of bounds" and sometimes "null function or function signature mismatch" #1956

Open
rpbouman opened this issue Feb 12, 2025 · 5 comments

Comments

@rpbouman
Copy link

rpbouman commented Feb 12, 2025

What happens?

I'm running duckdb/wasm using https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm
Attempts to read xlsx file using the core EXCEL extension fails with a runtime error. The error is typically "table index is out of bounds" but in some cases I get "null function or function signature mismatch"

To Reproduce

(I am attaching a html file that contains the repro script below, this should make it easy to reproduce. Note that the file was renamed to .txt because github wouldn't let me attach it otherwise.)

index.html.txt

In general terms I instantiated duckdb from the https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm bundle.
After instantiating duckdb, I open a connection and install the EXCEL extension:

<script type="module">
  import * as duckdb from 'https://cdn.jsdelivr.net/npm/@duckdb/[email protected]/+esm';

  const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
  const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);

  const worker_url = URL.createObjectURL(
    new Blob([`importScripts("${bundle.mainWorker}");`], {type: 'text/javascript'})
  );
  const worker = new Worker(worker_url);
  const logger = new duckdb.ConsoleLogger();
  const db = new duckdb.AsyncDuckDB(logger, worker);
  URL.revokeObjectURL(worker_url);

  await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
  const connection = await db.connect();
        
  // expose duckdb as global
  window.duckdb = {
    api: duckdb,
    instance: db,
    connection: connection
  };
  await connection.query('INSTALL EXCEL');
  await connection.query('LOAD EXCEL');
</script>

Then, get a File object for a .xlsx file picked by the user with a HTML input=file control, and register it with duckdb.
After that I run a query with an explicit call to read_xlsx, and pass in name of the registered file:

        document.getElementById('fileInput').addEventListener('change', async function(event){
          var files = event.currentTarget.files;
          if (files.length === 0){ 
            alert('No file selected!');
            return;
          }
          
          var file = files[0];
          var duckdb = window.duckdb;
          try {
            await duckdb.instance.registerFileHandle(
              file.name, 
              file, 
              duckdb.api.DuckDBDataProtocol.BROWSER_FILEREADER, 
              true
            );
            alert(`File registered!`);
          }
          catch (e){
            alertError('Error registering file', e);
            return;
          }

          try {
            var sql = `SELECT * FROM read_xlsx('${file.name}') LIMIT 1`;
            var result = await duckdb.connection.query(sql);
            debugger;
          }
          catch(e){
            alertError('Error executing query on file', e);
            return;
          }
        });

This results in the "table index is out of bounds" runtime error:

RuntimeError: table index is out of bounds
    at 07c5fdf6:0x12206b9
    at 07c5fdf6:0x121f27e
    at 07c5fdf6:0x1221648
    at 07c5fdf6:0x1221fd5
    at 07c5fdf6:0x122c5fd
    at 07c5fdf6:0x936b4d
    at 07c5fdf6:0x941012
    at Object.ccall (duckdb-eh.js:13167:27)
    at fe (runtime.ts:113:9)
    at ha.runQuery (bindings_base.ts:175:27

On another occasion I got "null function or function signature mismatch", which has one extra line in the stack trace:

RuntimeError: null function or function signature mismatch
    at 07c5fdf6:0x8e7b5c
    at 07c5fdf6:0x12206b9
    at 07c5fdf6:0x121f27e
    at 07c5fdf6:0x1221648
    at 07c5fdf6:0x1221fd5
    at 07c5fdf6:0x122c5fd
    at 07c5fdf6:0x936b4d
    at 07c5fdf6:0x941012
    at Object.ccall (duckdb-eh.js:13167:27)
    at fe (runtime.ts:113:9)

Browser/Environment:

Chrome 133.0.6943.59

Device:

Laptop

DuckDB-Wasm Version:

@duckdb/[email protected]

DuckDB-Wasm Deployment:

own application, see attachment.

Full Name:

Roland Bouman

Affiliation:

EPAM Systems BV

@carlopi
Copy link
Collaborator

carlopi commented Feb 14, 2025

Thanks for reporting, I know what's up, there are two slightly different path in extension building, I will have this fixed.

It's already working via:

INSTALL excel FROM core_nightly;
LOAD excel;
FROM read_xlsx('');

I will cleanup so the right file it's available from the default endpoint.

@rpbouman
Copy link
Author

Just tried it by passing the core_nightly repo and can confirm that it works that way. Awesome!
Looking forward to the new release where the fixed version will load from core.
Cheers!

Roland

@booandrew
Copy link

@carlopi Hi!
Is there a reason why I get the following error when trying to load this extension?

Error:
IO Error: Extension https://nightly-extensions.duckdb.org/c9f1c5bde4/wasm_eh/excel.duckdb_extension.wasm is not available

await conn.query('INSTALL excel FROM core_nightly;');
await conn.query('LOAD excel;');
await conn.query(
  `CREATE or REPLACE VIEW test AS SELECT * FROM read_xlsx('${file.name}');`,
);

"@duckdb/duckdb-wasm": "1.29.1-dev114.0"

@rpbouman
Copy link
Author

rpbouman commented Mar 4, 2025

@booandrew Yeah I got that too. But it is available from core and a couple of the bugs that would prevent that one from running have now been fixed. Perhaps this works too for your use case.

@booandrew
Copy link

@rpbouman Yes! I found a way to run it from the core!
Thanks to the DuckDB team for this repo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants