Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for asyncio (SNOW-1544406) #38

Open
jpassaro opened this issue Oct 24, 2017 · 56 comments
Open

support for asyncio (SNOW-1544406) #38

jpassaro opened this issue Oct 24, 2017 · 56 comments
Assignees
Labels
enhancement The issue is a request for improvement or a new feature status-in_progress Issue is worked on by the driver team status-triage_done Initial triage done, will be further handled by the driver team triaged

Comments

@jpassaro
Copy link

The Snowflake connector for python seems to be implemented essentially as API calls over HTTP. Using aiohttp, companion subclasses to SnowflakeConnector, SnowflakeCursor, SnowflakeRestful etc, could be created that implement the key methods as asynchronous coroutines. Then asyncio tools could be used to run Snowflake connection routines alongside other I/O-centric or API-driven tasks.

Has this been considered? Is it a viable addition to the Snowflake Connector? If so I'm happy to contribute, I'd love to hear any requirements you folks might have in mind. Or on the other hand, is it more appropriate as a fork, or as a separate project altogether in the style of aiobotocore?

@smtakeda
Copy link
Contributor

Thanks for suggestion. Yes, we considered aiohttp as well as any async feature in 3.4+. There are two concerns: 1) since the driver needs to support both 2.7+ and 3.4+, it may need to have a branch for 3.4+, 2) it is not clear how to handle OCSP check in aiohttp.

OCSP check is a requirement from our security team, though the standard Python's SSL library doesn't support it (and even most other clients don't care about the certificate revocation status!). We use a monkey patch on the top of https://github.com/requests/requests to intercept SSL handshake and add the OCSP checks.

So definitely all contribution would be pleased along with the above concerns addressed.

@smtakeda smtakeda added the enhancement The issue is a request for improvement or a new feature label Oct 25, 2017
@szelenka
Copy link

Checking if there has been any progress with asyncio support for this package, the last update seems to be over 2 years ago..

@smtakeda
Copy link
Contributor

smtakeda commented Jun 3, 2019

sorry we didn't have enough bandwidth to work on this. A plan is add async support after dropping python2.

@krrg
Copy link

krrg commented Jul 16, 2019

Python 2 will stop being supported by the Python Foundation on Jan 1, 2020. Will Snowflake also drop support at that time? Just curious if any timeline has been established on when Snowflake will EOL Python 2 support.

If it helps anyone interested, as a temporary workaround, I've been running Snowflake queries inside a concurrent.futures.ThreadPoolExecutor using what basically amounts to await asyncio.get_event_loop().run_in_executor(...).

This has been just fine for my purposes (internal reporting) and allows me to still use the other neat asyncio features while not blocking the main thread.

@smtakeda
Copy link
Contributor

For Python 2 support, feel free to discuss in #107.
We have not decided when we are going to drop Python 2 support yet.
Good new is the tread of Python 2 download is down:
https://pypistats.org/packages/snowflake-connector-python

We'll keep eyes on the usage metrics on our end to determine the timing of Python2 drop.

@smtakeda
Copy link
Contributor

We want this for Snowflake:
https://github.com/MagicStack/asyncpg
https://github.com/aio-libs/aiomysql

@farvour
Copy link

farvour commented Oct 16, 2019

What's the progress on aio support? Anything that uses it, such as sqlalchemy-aio cannot take advantage of this dialect+driver without it...

@smtakeda
Copy link
Contributor

Not much progress. In the planning meeting.

@mvoitko
Copy link

mvoitko commented Dec 9, 2019

Thanks for suggestion. Yes, we considered aiohttp as well as any async feature in 3.4+. There are two concerns: 1) since the driver needs to support both 2.7+ and 3.4+, it may need to have a branch for 3.4+, 2) it is not clear how to handle OCSP check in aiohttp.

Python 2.7 will be deprecated since Jan 1 2020. ANy progress on async SnowFlake connector?

@islobodch
Copy link

Commenting here to re-emphasise the issue. We are going to use asyncio in ongoing projects involving Python, and not having support for async is a bit frustrating. Are there any news regarding this? Thanks.

@jimfang
Copy link

jimfang commented Apr 5, 2020

It is very important for snowflake to support the async connector. Otherwise, we have to workaround. It is a big bottleneck for performance optimization. Thanks.

@samstiyer
Copy link

Hey Everyone, just wanted to bump this feature request. With FastAPI becoming more and more utilized in the python ecosystem, support for an Asyncio continues to become more important. Thanks!

@madhukar01
Copy link

Bump!! We need asyncio support :)

@whardier
Copy link

Please implement pgsql wire protocol for Snowflake. Hard must.

@krrg
Copy link

krrg commented Feb 15, 2021

I think what Snowflake would really benefit from is some sort of publicly documented HTTP/REST API. It would be a lot easier to build an asyncio community library on top of something that is documented than trying to reverse engineer it or come up with sub-optimal solutions.

@whardier
Copy link

Snowflake is the primary database for a project I am working on. It has been an absolute struggle dealing with all of the gotchas. Having some visibility would be a good idea - however I am going to assume the security team would struggle with this. Definitely sold to our CTO as very much compatible with existing workflows, tech stacks, etc... and showed a lot of promise by offering an sqlalchemy driver.

I have been mostly using usql (https://github.com/xo/usql) to fill in a lot of gaps with the web ui, the almost unusable cli offered by snowflake, and to act as a system call when I need something done async without having to deal with busted multiprocessing solutions.

@dennis-weyland-by
Copy link

Next bump. Building a FastAPI application which needs to perform sync and async queries would be much better if the snowflake connector would support asyncio. Yes the sqlalchemy connector is supporting this but lacking snowflake exclusive features like execute_async.

@keller00 are there any updates?

@sfc-gh-mkeller
Copy link
Collaborator

I'm sorry, but this is not planned for anytime soon.
Our codebase is built upon using urllib3 and other dependencies that use it under the hood (boto3 comes to my mind immediately). We also monkey patch our own OCSP verification into urllib3 for extra security.
Last time I checked urllib3 said that they will not support asyncio ever, so to support it we'd need a complete rewrite of the library, which we have tried, but the benchmarks didn't live up to our standards unfortunately.

However; we do as of recently support our own Async execution feature, see documentation here: https://docs.snowflake.com/en/user-guide/python-connector-example.html#label-python-connector-asynchronous-query-examples
I hope that this could be useful for some of you!

@whardier
Copy link

whardier commented Mar 19, 2021 via email

@allenhumphreys
Copy link

Why support python at all if you can't invest the time and effort to support modern Python?

@Lexicality
Copy link

Async Python is Future Python, not Modern Python. A lot of widely used libraries are not (yet) async compatible and most Python running in the real world is not async.
Don't be silly.

@allenhumphreys
Copy link

Hi, I'm from the future! 🛸

@whardier
Copy link

Aww sweet! I am not the only modern time traveler.

@dennis-wey
Copy link
Contributor

Since to me this is easily the most frustrating topic about snowflake, I thought I'll describe some use cases:

In general asyncio is a solution to achieve concurrent programming in python. This is different from parallel computing (see here: https://stackoverflow.com/questions/1897993/what-is-the-difference-between-concurrent-programming-and-parallel-programming)
That makes asyncio great when waiting for "external" processes like IO, Rest API and other stuff where your cpu is basically just waiting. And that's exactly what our cpu is doing when interacting with snowflake. We send sql str, and then ask the service in intervals if its done computing. So that means that asyncio is not just some new fancy way to program but actually the perfect feature for snowflake since it's made for such use cases.

With that in mind here are some use cases I can think of when dealing with snowflake:

Working with fastapi
It's one of the fastest and most popular REST frameworks in python. And it achieves that by utilizing asyncio to the core. Yes you can incorporate also synchronous workflows in your application, but you won't have the same performance/flexibility. So when building REST API with fastapi. If you have sync-only libraries like this here, they always feel cumbersome to deal with, since you have to build your way around with threading.

Airflow 2.X Deferrable Operators
In the python community Airflow is maybe the most popular tool when managing your workflows. It handles task execution scheduling and comes with a nice UI. With their new version they introduced Deferrable Operators. So basically you're operator gets a new state "deferred" which is used for wait for external dependency and it's implemented using asyncio. This makes Airflow much more resilient and efficient since you don't need an executor during this state. Rather all your tasks have an associated trigger running in a single event loop waiting for your task to continue. Perfect feature to execute Snowflake queries using airflow if snowflake would support this.

Running 2 independent Snowflake queries at the same time
Not kidding just try it. The code it's either way slower than it have to be or ugly to read or both.
But let's go through our options:

  • using execute_async: With that you actually have some kind of concurrent solution but only when waiting for queries. Query creation and pulling the results have to be done synchronously. Code wise you have to build a loop, where you ask snowflake for each query ID if the status is finished. Sound like boilerplate code to me.
  • using threading: Probably what most of the frustrated asyncio users concurrently do with snowflake. Basically you use run_in_executor. This opens a thread which runs your synchronous code (your snowflake query) and is awaited inside the main loop. The problem with threads is: It's really difficult to say what your program is actually doing. Is my snowflake query currently blocking another thread because my cpu decide to give it some attention? Also again, some beginner unfriendly code required. But in general probably the best solution, since threads are easy and fast to create.
  • using process pooling: Same as before but switching executors. Here we know have true parallelism. 2 CPU starting a query and then do nothing besides waiting for it. Efficiency wise this feels completely over the top and you're loosing time creating the new processes. Code-wise nearly identical to the threading solution, so also not great.

and how could it look using asyncio?
Looking at python3.11 we're getting TaskGroup:

async with asyncio.TaskGroup() as tg:
    task1 = tg.create_task(sf.execute(query1))
    task2 = tg.create_task(sf.execute(query2))

While it's only 3 lines of code it also has a much better and easier way to handle exceptions like any solution currently available.

Summary
Asyncio is not a new overhyped way of programming in python. By now, you can't even call it new anymore. But it's actually the perfect feature for snowflake and databases in general, since we don't need local parallelism when the warehouses in the cloud are doing the job for us.

I saw quite some media about snowpark and how you want to appeal more to the python community. Personally, I don't think you can achieve that with your standard python client missing such a crucial feature.

@ian-whitestone
Copy link

Can any of the maintainers comment whether this is on the roadmap? I know you can't commit to dates, but at least getting some acknowledgement so we know we're not screaming into a void would be hugely appreciated.

@sfc-gh-anugupta
Copy link

Hi All , Thanks all for your inputs and suggestion . Adding support for asyncio is on our roadmap and will keep the thread posted on updates .

@layandreas
Copy link

Should definitely be a high priority!

@DustinMoriarty
Copy link

DustinMoriarty commented Mar 31, 2023

We would highly value this as well. Snowflake is highly used by data engineers who, like it or not, are most familiar with python. A lot of times I write code in python simply so that it can be supported by data engineers even though python is far from the most performant language. However, a lot of code is mostly IO waiting for long running tasks. Concurrency for IO is far more important than in memory execution speed for processes which are mostly performing orchestration in a data warehouse. Therefore, python can be nearly as fast as compiled languages for these type of tasks and we have a lot easier time hiring engineers who know python than other languages.

Asyncio in python is now mature and it is well supported by foundational libraries for tasks such as HTTP, TCP and file IO.

Yes, it is possible to run "async" queries where you get a query ID and then check back later to see if the query is done. However, the underlying IO is still blocking. Therefore, it takes a lot of fairly careful design to perform parallel queries efficiently with this pseudo-async approach. You have to run a blocking synchronous loop to get all the query ID's and then another blocking synchronous loop to go check all the queries, skip the completed ones on each pass and return the results once they are all done. I am not even going to try to explain this to most data engineers.

@DustinMoriarty
Copy link

@awm33 unless I'm missing something, this will have no impact on performance comparing to the snowflake official version. that's because the underlaying snowflake driver is still using python native sockets which is blocking the thread it is executed on.

I am trying to see if I can use your code. However, I am confused by what you mean about using the threaded executor. It looks to me like you are using self._loop.run_in_executor and self.loop is from asyncio. I think I am missing something. Does partial somehow help you with this? Are you passing in a different pool which uses multithreading instead of asyncio during instantiation of SnowflakeConnection? I am looking for some place where multithreading get's involved. I was going to try to make some similar wrapper with asyncio around a multithreaded sync code.

timostrunk pushed a commit to timostrunk/snowflake-connector-python that referenced this issue Dec 11, 2023
@ian-whitestone
Copy link

Hi All , Thanks all for your inputs and suggestion . Adding support for asyncio is on our roadmap and will keep the thread posted on updates .

Any updates? 🙏

@Ousret

This comment was marked as outdated.

@awestm
Copy link

awestm commented Apr 18, 2024

@sfc-gh-anugupta @sfc-gh-dszmolka any updates?

@sfc-gh-dszmolka
Copy link
Contributor

Thank you folks for all the feedback and interest! It is too early to give out any estimated timeline, but the team is busy with the planning and design; so there is some progress.

Speaking about which, we'll update this thread when there's any significant new information on the progress. Thank you very much for bearing with us !

@sfc-gh-dszmolka
Copy link
Contributor

Short update to confirm this is still on the roadmap and in progress with the team. No timeline available at this moment - thank you everyone for your patience here.

@copdips
Copy link

copdips commented Jul 26, 2024

as snowfkale has already provided standard rest apis, at least for basic CRUD operations, it's technically no more difficult to use it with asyncio by ourselves, instead of waiting for the official aio SDK from snowflake.
But yes, a real aio SDK is always better.

@sfc-gh-dszmolka
Copy link
Contributor

short update: internal POC in progress

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-in_progress Issue is worked on by the driver team label Aug 23, 2024
@sfc-gh-dszmolka
Copy link
Contributor

sfc-gh-dszmolka commented Aug 23, 2024

quick update: as you might have seen from the PRs :) team is actively working on the project and towards to getting out the initial alpha version of the connector which supports async.

edit: we do understand there's a huge interest in this feature, but at this moment (3 October 2024) there is no official ETA yet. Please keep tuned, because this thread will be updated once there's important information to post. Until then, I'm afraid you'll need to bear with us for a bit and thank you for your patience!

@AronsonDan
Copy link

Any progress on that?
U guess a lot of people would really love to see that feature out

@sfc-gh-dszmolka
Copy link
Contributor

update: initial support for asyncio is imminent with alpha release 3.13.0a1 , and will be available as snowflake.connector.aio submodule. Please stay tuned for a little more and again, thank you for your patience here!

@sfc-gh-dszmolka sfc-gh-dszmolka changed the title support for asyncio support for asyncio (SNOW-1544406) Nov 25, 2024
@sfc-gh-dszmolka
Copy link
Contributor

update: apparently plans changed and with the 'imminent'-ness of the release I was overly optimistic, apologies. Current status is that the code planned for the private preview scope is complete, and Snowflake Product Management is working on the next steps. Probably this wasn't the update everyone was hoping for :(

For those who are already Snowflake customers, your account team can track the internal status based on the ticket number mentioned in the issue title (SNOW-1544406). I'm anyways trying to provide updates time to time for everyone to see. Thank you for bearing with us folks.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Dec 16, 2024
@dhendry
Copy link

dhendry commented Feb 12, 2025

It would be really useful to see this!

@AronsonDan
Copy link

And the world keeps on spinning round and round 🦖

@sfc-gh-dszmolka
Copy link
Contributor

A lot has changed internally, which of course should absolutely not influence how projects are delivered, but unfortunately it does. From what I know , the project is still on the table at least. I understand how the uncertainty is super frustrating, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is a request for improvement or a new feature status-in_progress Issue is worked on by the driver team status-triage_done Initial triage done, will be further handled by the driver team triaged
Projects
None yet
Development

No branches or pull requests