Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教:【微流水线并行机制】是如何执行的 #70

Open
sikey647 opened this issue Oct 29, 2024 · 3 comments
Open

请教:【微流水线并行机制】是如何执行的 #70

sikey647 opened this issue Oct 29, 2024 · 3 comments

Comments

@sikey647
Copy link

想请教下,介绍中的【微流水线并行机制】是指GraphProcessor间并行,还是在GraphProcessor内并行。通常有这样的场景:当服务收到的数据集比较大时,一般会分batch并行处理,一种是分多个GraphProcessor并行,但目前看example中,在构图阶段,GraphProcessor的数量是固定的,是否可以在运行阶段调整GraphProcessor数量? 还一种方式在GraphProcessor内并行,研读代码时貌似没有这种支持。

@oathdruid
Copy link
Collaborator

oathdruid commented Oct 29, 2024

这边典型的用法是用channel接上下游processor,之后下游每消费一个batch size就启动一个task/coroutine去处理;具体并发度通过这个batchsize来调控;实际效果是processor间流水线并行,processor内部minibatch并行;

@oathdruid
Copy link
Collaborator

内部因为起bthread协程比较简单就没在process层做封装;业务有时候会把普通data和channel混合使用,还有做多流join机制的;脱离业务做通用base processor可能得考虑下怎么做能不伤这类灵活性,应该可以抽象一些通用的builtin出来

@sikey647
Copy link
Author

明白了,感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants