-
Couldn't load subscription status.
- Fork 13.7k
[FLINK-38559][python] Support unordered mode of async function in Python DataStream API #27145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…hon DataStream API
f4259c1 to
3625e30
Compare
| def _validate(data_stream: DataStream, async_function: AsyncFunction): | ||
| if not inspect.iscoroutinefunction(async_function.async_invoke): | ||
| raise Exception("Method 'async_invoke' of class '%s' should be declared as 'async def'." | ||
| % type(async_function)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| % type(async_function)) | |
| % type(async_function).__name__) |
i think type(function) will return a verbose result, maybe this change gives actual name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type(function) returns the full path of the class name and type(async_function).__name__ returns just the class name. My intention here is to print the full path which I think will be more helpful.
| j_python_data_stream_function_operator)) | ||
|
|
||
| @staticmethod | ||
| def _validate(data_stream: DataStream, async_function: AsyncFunction): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _validate(data_stream: DataStream, async_function: AsyncFunction): | |
| def _validate(data_stream: DataStream, async_function: AsyncFunction) -> None: |
| return False | ||
|
|
||
|
|
||
| class ResultHandler(ResultFuture, Generic[OUT]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand correctly that retry is not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. It's planned to be supported in a separate ticket https://issues.apache.org/jira/browse/FLINK-38561
|
|
||
| class ResultHandler(ResultFuture, Generic[OUT]): | ||
|
|
||
| def __init__(self, classname, timeout_func, exception_handler, record, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add type hints here too?
| self._async_function_runner.stop() | ||
| self._async_function_runner = None | ||
|
|
||
| def process_element(self, windowed_value, element): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder what will happen if the element is a watermark record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the watermark and element are sent together, the structure of element is as following:
CURRENT_TIMESTAMP, CURRENT_WATERMARK, NORMAL_DATA
| watermark = element[1] | ||
| record = element[2] | ||
|
|
||
| self._queue.advance_watermark(watermark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for my own curiosity, what is the reason to advance watermark before the entry (the line below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the data is behind the watermark. You can refer to ExternalPythonProcessOperator for more details.
| return | ||
|
|
||
| self._exception_handler( | ||
| Exception("Could not complete the element:" + str(self._record), error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if it is possible to have a test case to trigger this exception?
|
@charlesdong1991 Thanks for the review! Have updated the PR accordingly ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! leave some comments
|
|
||
| def __init__(self, capacity): | ||
| self._incomplete_elements = set() | ||
| self._complete_elements = collections.deque(maxlen=capacity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe what we mean here is "completed" (i.e. something is finished) instead of "complete"(i.e. entire)
| if not isinstance(result, Iterable): | ||
| raise RuntimeError("The 'result_future' of AsyncFunction should be completed with " | ||
| "data of list type, please check the methods 'async_invoke' and " | ||
| "'timeout' of class '%s'." % self._classname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can move 'RuntimeError' int the previous "else" instead of doing another if not isinstance(result, Iterable) again, Like
if not isinstance(result, Iterable):
# complete with empty result, so that we remove timer and move ahead processing
self._process_results([])
raise RuntimeError("The 'result_future' of AsyncFunction should be completed with "
"data of list type, please check the methods 'async_invoke' and "
"'timeout' of class '%s'." % self._classname)
else:
self._process_results(result)
|
|
||
| def stop(self): | ||
| if self._loop is not None: | ||
| self._loop.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do self._loop.close() to free underlying resources?
| self._loop.stop() | |
| self._loop.stop() | |
| self._loop.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added self._loop.close() in method run.
|
|
||
| def run_async(self, async_function, *arg): | ||
| wrapped_function = self.exception_handler_wrapper(async_function, *arg) | ||
| asyncio.run_coroutine_threadsafe(wrapped_function, self._loop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I understand the error handling here correctly so I try to summarize the error handling process with a simple case:
- there are 3 records that should be processed by
AsyncOperation'sprocess_element. self._async_function_runner.run_asyncis called for element1 in main thread.self._async_function_runner.run_asyncis called for element2 in main thread but the task has not been submitted to theloopthread(i.e. line 172 has not been executed).- the
loopthread executes the processing of element1, then exception happens,AsyncOperationmarks exception. AsyncOperationin main threads submits the element2 toloopthread.AsyncOperationin main thread finds exception when processing element3 and raises. After some population, the main thread callsAsyncOperation's close() and stops theloopthread.
According to the doc of async io:
if the `loop` thread has not process element2's task, then the thread just stops and no extra processing for element2.
If the `loop` thread is processing element2, the `loop` thread will finish the processing. But as AsyncOperation has been closed, is it possible that the `loop` thread runs into some bad cases(e.g. if the async function uses some freed resources setup in advance)?
| if output_mode == AsyncFunctionDescriptor.OutputMode.UNORDERED: | ||
| self._queue = UnorderedStreamElementQueue(capacity, self._raise_exception_if_exists) | ||
| else: | ||
| raise NotImplementedError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise NotImplementedError() | |
| raise NotImplementedError("ORDERED mode is supported.") |
|
|
||
| if self._async_function_runner is not None: | ||
| self._async_function_runner.stop() | ||
| self._async_function_runner = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need self._exception = None here?
In other words, when failover happens, will we reuse AsyncOperation object or just create a new one?
What is the purpose of the change
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation