Skip to content

Conversation

noemotiovon
Copy link
Collaborator

What does this PR do?

Record ne and nb information for src tensors and include them in the graph matching check. This enhances the robustness of ACL graph matching by preventing incorrect matches when src tensors share the same data address but differ in shape or stride.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 22, 2025
@noemotiovon
Copy link
Collaborator Author

noemotiovon commented Sep 22, 2025

Model Parallel Inference Test

Qwen2.5-0.5B:

......
main: n_parallel = 8, n_sequences = 128, cont_batching = 1, system tokens = 273
External prompt file: used built-in defaults
Model and path used:  /home/lichenguang25/.ollama/models/blobs/sha256-6f96e01a3f550ca08aea1e5725bb8d5a7eccc6f281c30417e9d380b8c46467bd

Total prompt tokens:  17075, speed: 574.36 t/s
Total gen tokens:     13278, speed: 446.64 t/s
Total speed (AVG):           speed: 1020.99 t/s
Cache misses:             0

llama_perf_context_print:        load time =    1730.22 ms
llama_perf_context_print: prompt eval time =   11966.81 ms / 30601 tokens (    0.39 ms per token,  2557.16 tokens per second)
llama_perf_context_print:        eval time =     122.12 ms /    25 runs   (    4.88 ms per token,   204.71 tokens per second)
llama_perf_context_print:       total time =   29732.53 ms / 30626 tokens
llama_perf_context_print:    graphs reused =       1440

Record `ne` and `nb` information for src tensors and include them in the
graph matching check. This enhances the robustness of ACL graph matching
by preventing incorrect matches when src tensors share the same data
address but differ in shape or stride.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant