Skip to content

[3/n tensor engine] hello tensor engine #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: gh/zdevito/6/base
Choose a base branch
from

Conversation

zdevito
Copy link
Contributor

@zdevito zdevito commented Jun 6, 2025

Stack from ghstack (oldest at bottom):

Add the initialize controller class mesh_controller that can implement the tensor engine on top of the ProcMesh/ProcActor API.

The only example is currently a "hello world" that allocates and fetches a tensor.

Follow up PRs will integrate creating meshes this way into the testing code more deeply and fix the issues that come up with it.

This design assumes that supervision of stuff is going to be handled by the actor system and that tensor compute can just rely on that for monitoring and stuckness detection stuff.

This design has no ClientActor, and the ControllerActor only exists as a Instance handle for reading messages from the workers (which send some controller messages).

This does not attempt to clean up the existing RustController system yet, since it isn't feature equivalent or tested with it.

Differential Revision: D75909313

Add the initialize controller class `mesh_controller` that can implement the tensor engine on top of the ProcMesh/ProcActor API.

The only example is currently a "hello world" that allocates and fetches a tensor.

Follow up PRs will integrate creating meshes this way into the testing code more deeply and fix the issues that come up with it.


This design assumes that supervision of stuff is going to be handled by the actor system and that tensor compute can just rely on that for monitoring and stuckness detection stuff.

This design has no ClientActor, and the ControllerActor only exists as a Instance handle for reading messages from the workers (which send some controller messages).

This does not attempt to clean up the existing RustController system yet, since it isn't feature equivalent or tested with it.

Differential Revision: [D75909313](https://our.internmc.facebook.com/intern/diff/D75909313/)

[ghstack-poisoned]
zdevito added a commit that referenced this pull request Jun 6, 2025
Add the initialize controller class `mesh_controller` that can implement the tensor engine on top of the ProcMesh/ProcActor API.

The only example is currently a "hello world" that allocates and fetches a tensor.

Follow up PRs will integrate creating meshes this way into the testing code more deeply and fix the issues that come up with it.


This design assumes that supervision of stuff is going to be handled by the actor system and that tensor compute can just rely on that for monitoring and stuckness detection stuff.

This design has no ClientActor, and the ControllerActor only exists as a Instance handle for reading messages from the workers (which send some controller messages).

This does not attempt to clean up the existing RustController system yet, since it isn't feature equivalent or tested with it.

Differential Revision: [D75909313](https://our.internmc.facebook.com/intern/diff/D75909313/)

ghstack-source-id: 288027712
Pull Request resolved: #187
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants