Skip to content

feat: add temporary result storage and smoke test options #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peterfoldes
Copy link
Contributor

$ time poetry run python tests/smoke.py \
  --api-key-file ~/staging.key --region aws-us-west-2 --wide --debug --api-endpoint api.staging.wherobots.com -s --storage-format=geoparquet --single -p \
  "SELECT id,names['primary'], geometry,population FROM wherobots_open_data.overture_2024_02_15.admins_locality WHERE localityType = 'country' ORDER BY population DESC LIMIT 1000"
2025-06-03 14:06:44.782 INFO                 root: Query 3e2d1090-bfce-4806-8833-714fd399bb58 succeeded; full message is {'kind': 'state_updated', 'execution_id': '3e2d1090-bfce-4806-8833-714fd399bb58', 'state': 'succeeded', 'result_uri': 'https://wbts-temp-user-data-aws-us-west-2-staging.s3.amazonaws.com/djrm9bs9uf/lkgoq3t9tbaczx/sql-session-queries/3e2d1090-bfce-4806-8833-714fd399bb58/part-00000-3cbf8b40-6182-4fda-bc06-7f8eba21cad0-c000.snappy.parquet?AWSAccessKeyId=ASIAWG2SXS3KRUEHJ5AT&Signature=ygy%2BtEknfZW9evFHcY6YPKS7ayc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEEUaCXVzLXdlc3QtMiJHMEUCIQCJK8LUH0Oo8cEBI0%2BrLnt5LVY0cSYfIAx0PizBHZmo1AIgEpNwlzrOvF2h15K4pIlYIV34Aqtqktiioqh1xXJbg54qoQUIHhAAGgw0MjY5ODU4ODc0NDUiDJAOE%2BlIqAPa2oRjtir%2BBGxPYLBhdHcrlI2Ig2MuGWJ4RxIjhR4GdKrx5Gl5qNh7kdyyZ1NIMVRRrbEm9yFCJTmZfyyMc7FrKkA8Rd83Op09AojNN0Jf4vkpV6Qut5DrAh9%2BbjKr8VG0jywjfyvaA8UUW0Ip%2BS8v1S3xF1ltJf%2BPP2EdY1m8JBBK3A35u4oeIkj3KZ%2FeNRuyZapxpcVGtS6WBFMoZmzNxZ3Dh7%2BXB9V9MFLf1t4LCXqnpdt52sTj%2FrcOCG6%2BZiO5VdFIQ2aomcNQ8xOZh5J9bmgEmM%2Fp8ROM%2FKMxH1JtNT4GVjWj7%2BXA8I3ifHPC6vHcVAtFKcvt9ZSOWUQc3JXcJTkMlNrARFeiGDlsWGgD4ZeRhUhcjmvEDg0GBKRXcYX8HWvmMl2ctjXjKYnHccFI%2FqCRTtue9HUtiYBDbz%2FAy%2FtGynrybsUelEBDT4Own1yDf1zyoy1Cff05LveOnSQVRuwOkEBtvEEoYdfu9b%2Fu3nw1IDe7uyfseZQB4tisq1vydvMA1zjGc9tN3px%2FzleZ4SBIBbGK9bids6IQe2fDxw71BilIPcvFjq7hFtoIlyAWTApjNwXZdTodCwACMvrrleGlc0z06YtgtL6XmSgvc4CsLUiBqQy4wy6kLisIdEbS6PUVGhzIUDmjYGI5QZf%2BXetHbIXz7PDpENu4RW4f5MwU4rAaca%2FQzw6g0btQyJCQ8rVn20AGiNMIsajayoURwfW%2F0%2Bq%2BzSa3b4v8QPeWi%2FbvYgfWBNou9VZM9ajgLjh1tDC4g6IK%2F31MJac3A5pOW1dpQ6hsI2RneWx2Enty%2FKR1EmDC6JnsQnTSFQ2otpVR4Y%2FX8LU3zWvQKBF6WikvLuyWqrA1MNXC%2FcEGOpsBL2aoTme8VjQZDwRk87WZPFanGsz6bsz9OSA5cGxs1DhCrITlcSPKj49%2F3g42GkfMwoJzNtoMROpIntAAFgk8sBlu4hMHblrEqkZOu4Ayt0dLy9NSJe5qjgWnOQLH%2F1c80Kpr2UuQzhx4xaQ7CrYg3WDA4n6K5tL4DFhXEEAiR9GyzrIr3lhHwgOXwTwCApJ0kmepg7MPCTojY7Q%3D&Expires=1749589604', 'size': 17540}

@sfishel18
Copy link
Contributor

i'm getting some errors running this sql statement: SHOW SCHEMAS IN wherobots_open_data

it fails if i try to use either geoparquet or geojson as the storage format (ex. poetry run python tests/smoke.py --api-endpoint api.staging.wherobots.com --api-key-file ./api_key.txt "SHOW SCHEMAS IN wherobots_open_data" --store --storage-format geoparquet)

@sfishel18
Copy link
Contributor

so iiuc this change doesn't provide a way to get access to the results URI from the sdk, correct? but you can opt into behind-the-scenes temp storage. which you would do because it increases the limit on the number of results you can get back? and/or you get more efficient cursoring through the results because sql-session can pull them from its s3 cache based on the execution id?

@sfishel18
Copy link
Contributor

another interesting behavior i discovered, when passing boolean values in the execute sql web socket message, they have to be stringified. i.e. this is valid:

{
    kind: "execute_sql"
    execution_id: "bf1ffba3-bc64-485d-8bbe-475264884676"
    statement: "..."
    store: {
      "format": "geoparquet",
      "single": "true",
      "generate_presigned_url": "true"
    }
}

but this is not (note no quotes around true in store.single and store.generate_presigned_url:

{
    kind: "execute_sql"
    execution_id: "bf1ffba3-bc64-485d-8bbe-475264884676"
    statement: "..."
    store: {
      "format": "geoparquet",
      "single": true,
      "generate_presigned_url": true
    }
}

@@ -209,6 +215,17 @@ def __execute_sql(self, sql: str, handler: Callable[[Any], None]) -> str:
"statement": sql,
}

if self.__store:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having thought about this a bit, imo the sdk should ONLY support presigned urls. because without them the user would have to use the rest api to generate temporary aws credentials, and that endpoint is non-public

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants