Converted version of the Spider 1.0 dataset for Oracle. This repository contains only data and usage notes.
The original Spider dataset is here: https://yale-lily.github.io/spider
No software installation is required for this repository. Use the files directly with your existing Oracle tools and environment
ddl.jsonl
-
contains DDL statements for creating database tables. Each line is a JSON object with:
table
: Table name
sql
: CREATE TABLE statement in Oracle syntax
dml.json
-
contains DML statements - specifically INSERT statements to populate tables. Structure is:
metadata
: Batch information and execution statistics
tables
: Object containing all table names with their INSERT statements
dev_pairs.jsonl
-
Dev dataset containing question-query pairs. Each line has:
question
: Natural language question
query
: Corresponding SQL query in Oracle syntax
db_name
: Database name the query applies to
train_others_pairs.jsonl
-
Training dataset with question-query pairs. Format is similar to dev_pairs.jsonl
:
question
: Natural language question
query
: Oracle SQL query
db_name
: Database name
train_spider_pairs.jsonl
-
Main training dataset with question-query pairs. Similar format as dev_pairs.jsonl
question
: Natural language question
query
: Oracle SQL query
db_name
: Database name
-
Run DDLs first to create and recreate objects; then run DML to populate data
-
If you change schema or naming, apply a consistent prefix to table names in both
CREATE TABLE
andINSERT INTO
statements -
Use explicit date/time conversions (e.g.,
TO_DATE
/TO_TIMESTAMP
) rather than relying on session NLS settings -
Use
executemany
to run speed up bulk inserts and updates
Please consult the security guide for our responsible security vulnerability disclosure process
Copyright (c) 2025 Oracle and/or its affiliates.
Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/.