-
Notifications
You must be signed in to change notification settings - Fork 403
[importer] Add API endpoint and serializer for creating tables from files #4164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Harshg999
wants to merge
2
commits into
master
Choose a base branch
from
importer-table-create
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -173,3 +173,133 @@ def validate(self, data): | |
| raise serializers.ValidationError({"sheet_name": "Sheet name is required for Excel files."}) | ||
|
|
||
| return data | ||
|
|
||
|
|
||
| class CreateTableSerializer(serializers.Serializer): | ||
|
||
| """Serializer for table creation request validation. | ||
|
|
||
| This serializer validates the parameters required for creating a SQL table from a file. | ||
|
|
||
| Attributes: | ||
| file_path: Path to the source file to import data from | ||
| file_type: Type of file format (csv, tsv, excel, delimiter_format) | ||
| import_type: Type of import (local or remote) | ||
| sql_dialect: Target SQL dialect for table creation | ||
| database_name: The database name in which to create the table | ||
| table_name: The name of the table to create | ||
| has_header: Whether the file has a header row | ||
| columns: List of column definitions (each with name, type, etc.) | ||
| partition_columns: List of partition column definitions (optional) | ||
| external: Whether to create an external table (optional) | ||
| external_path: External table location path (optional, required if external=True) | ||
| table_format: Table storage format (text, parquet, orc, avro, kudu, iceberg) (optional) | ||
| field_separator: Field separator character (required for delimited files) | ||
| quote_char: Quote character (required for delimited files) | ||
| record_separator: Record separator character (required for delimited files) | ||
| sheet_name: Sheet name for Excel files (required when file_type is excel) | ||
| comment: Table comment (optional) | ||
| is_transactional: Whether to create a transactional table (optional, Hive only) | ||
| is_iceberg: Whether to create an Iceberg table (optional) | ||
| is_insert_only: Whether the transactional table is insert-only (optional) | ||
| primary_keys: List of primary key column names (optional, required for Kudu tables) | ||
| """ | ||
|
|
||
| # Source file information | ||
| file_path = serializers.CharField(required=True, help_text="Path to the file to import data from") | ||
| file_type = serializers.ChoiceField( | ||
| choices=["csv", "tsv", "excel", "delimiter_format"], required=True, help_text="Type of file (csv, tsv, excel, delimiter_format)" | ||
| ) | ||
| import_type = serializers.ChoiceField( | ||
| choices=["local", "remote"], required=True, help_text="Whether the file is local or on a remote filesystem" | ||
| ) | ||
|
|
||
| # Target table information | ||
| sql_dialect = serializers.ChoiceField( | ||
| choices=["hive", "impala", "trino", "phoenix", "sparksql"], required=True, help_text="SQL dialect for creating the table" | ||
| ) | ||
| database_name = serializers.CharField(required=True, help_text="Database name where the table will be created") | ||
| table_name = serializers.CharField(required=True, help_text="Name of the table to create") | ||
|
|
||
| # Data format information | ||
| has_header = serializers.BooleanField(default=False, help_text="Whether the file has a header row") | ||
|
|
||
| # Column definitions | ||
| columns = serializers.ListField( | ||
| child=serializers.DictField(), required=True, help_text="List of column definitions with name, type, etc." | ||
| ) | ||
|
|
||
| # Optional parameters | ||
| partition_columns = serializers.ListField( | ||
| child=serializers.DictField(), required=False, default=[], help_text="List of partition column definitions" | ||
| ) | ||
| comment = serializers.CharField(required=False, allow_blank=True, default="", help_text="Table comment") | ||
|
|
||
| # Table storage options | ||
| external = serializers.BooleanField(default=False, help_text="Whether to create an external table") | ||
| external_path = serializers.CharField(required=False, allow_blank=True, help_text="Location path for external tables") | ||
| table_format = serializers.ChoiceField( | ||
| choices=["text", "parquet", "orc", "avro", "kudu", "iceberg"], default="text", help_text="Storage format for the table" | ||
| ) | ||
|
|
||
| # Hive/Impala specific options | ||
| is_transactional = serializers.BooleanField(default=False, help_text="Whether to create a transactional table (Hive)") | ||
| is_insert_only = serializers.BooleanField(default=False, help_text="Whether the transactional table is insert-only") | ||
| is_iceberg = serializers.BooleanField(default=False, help_text="Whether to create an Iceberg table") | ||
|
|
||
| # Kudu specific options | ||
| primary_keys = serializers.ListField( | ||
| child=serializers.CharField(), required=False, default=[], help_text="List of primary key column names (required for Kudu tables)" | ||
| ) | ||
|
|
||
| # Excel-specific fields | ||
| sheet_name = serializers.CharField(required=False, help_text="Sheet name for Excel files") | ||
|
|
||
| # Delimited file-specific fields | ||
| field_separator = serializers.CharField(required=False, help_text="Field separator character") | ||
| quote_char = serializers.CharField(required=False, help_text="Quote character") | ||
| record_separator = serializers.CharField(required=False, help_text="Record separator character") | ||
|
|
||
| # Additional options | ||
| load_data = serializers.BooleanField(default=True, help_text="Whether to load data from the file into the table") | ||
|
|
||
| def validate(self, data): | ||
| """Validate the complete data set with interdependent field validation.""" | ||
|
|
||
| # Validate Excel-specific parameters | ||
| if data.get("file_type") == "excel" and not data.get("sheet_name"): | ||
| raise serializers.ValidationError({"sheet_name": "Sheet name is required for Excel files."}) | ||
|
|
||
| # Validate delimited file-specific parameters | ||
| if data.get("file_type") in ["csv", "tsv", "delimiter_format"]: | ||
| if not data.get("field_separator"): | ||
| # If not provided, set default value based on file type | ||
| if data.get("file_type") == "csv": | ||
| data["field_separator"] = "," | ||
| elif data.get("file_type") == "tsv": | ||
| data["field_separator"] = "\t" | ||
| else: | ||
| raise serializers.ValidationError({"field_separator": "Field separator is required for delimited files"}) | ||
|
|
||
| if not data.get("quote_char"): | ||
| data["quote_char"] = '"' # Default quote character | ||
|
|
||
| if not data.get("record_separator"): | ||
| data["record_separator"] = "\n" # Default record separator | ||
|
|
||
| # Validate external table parameters | ||
| if data.get("external") and not data.get("external_path"): | ||
| raise serializers.ValidationError({"external_path": "External path is required for external tables."}) | ||
|
|
||
| # Validate Kudu table parameters | ||
| if data.get("table_format") == "kudu" and not data.get("primary_keys"): | ||
| raise serializers.ValidationError({"primary_keys": "Primary keys are required for Kudu tables."}) | ||
|
|
||
| # Validate transaction table parameters | ||
| if data.get("is_transactional") and data.get("sql_dialect") not in ["hive", "impala"]: | ||
| raise serializers.ValidationError({"is_transactional": "Transactional tables are only supported in Hive and Impala."}) | ||
|
|
||
| # Validate Iceberg table parameters | ||
| if data.get("is_iceberg") and data.get("sql_dialect") not in ["hive", "impala", "sparksql"]: | ||
| raise serializers.ValidationError({"is_iceberg": "Iceberg tables are only supported in Hive, Impala, and SparkSQL."}) | ||
|
|
||
| return data | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function body is empty (
pass), so calls to create_table will silently return None. Implement the table creation logic or raise NotImplementedError until ready.