[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI#17
[EWT-1250] Sqlalchemy/Superset expectations from python-DBAPI#17
Conversation
| schema = reader.schema | ||
| columns = schema.names | ||
| column_types = [field.type for field in schema] | ||
| rows = reader.read_all().to_pandas().values.tolist() |
There was a problem hiding this comment.
Do you need both read_all().to_pandas() or can you read_pandas() directly? (to be fair according to the docs it does the same thing)
There was a problem hiding this comment.
Good catch! Using read_pandas() now for better readability.
| }, | ||
| headers=headers, | ||
| if ws_url: | ||
| session_uri = ws_url |
There was a problem hiding this comment.
Couple of things here:
- This can be done earlier, so if we feel confident in using the
ws_urlwe can do that right after checking the token/api-keys. - This needs a little bit different logging, since we're not requesting a new runtime (see line 70).
- This should early return so that we don't have to indent everything afterwards. Makes it more readable.
| results_format: Union[ResultsFormat, None] = None, | ||
| data_compression: Union[DataCompression, None] = None, | ||
| geometry_representation: Union[GeometryRepresentation, None] = None, | ||
| ws_url: str = None, |
There was a problem hiding this comment.
Why is this required, instead of calling connect_direct() directly?
The reason this parameter wasn't included in connect() is that it creates ambiguity between the rest of the parameters (like runtime/region) and the runtime you'd actually connect to when providing a ws_url, which may not match those choices.
| query.handler(json.loads(result_bytes.decode("utf-8"))) | ||
| data = json.loads(result_bytes.decode("utf-8")) | ||
| columns = data["columns"] | ||
| column_types = data.get("column_types") |
There was a problem hiding this comment.
Is column_types optional? If so, then it's good that you're using data.get() here, but then in Cursor.__get_results you expect column_types to be non-None. You either need to ensure column_types is always provided, or change __get_results to be more defensive.
| True, # null_ok; Assuming all columns can accept NULL values | ||
| ) | ||
| for col_name in result.columns | ||
| for i, col_name in enumerate(columns) |
There was a problem hiding this comment.
Use https://docs.python.org/3/library/functions.html#zip to avoid jumping hoops with an index (it's much nicer to read, and also more efficient):
self.__description = [
(
col_name,
_TYPE_MAP.get(col_type, 'STRING'),
...
)
for (col_name, col_type) in zip(columns, column_types)
]
This PR introduces the following requirements that have risen from the Wherobots x Superset integration -
Rowobject.rollback()andcommit()to be implemented. Other OLAP databases such as pyhive simple "pass" the not implementedrollback()andcommit()methods. For context - Superset's background processes often bypass the SQLAlchemy dialect and directly interacts with DBAPI. This is why overriding the rollback and commit methods in the Dialect doesn't suffice.ws_url, toconnection. This helps maintain static connection pool configuration in Superset.