Skip to content

Conversation

BoyuanFeng
Copy link
Contributor

@BoyuanFeng BoyuanFeng commented Aug 20, 2025

This PR updates the input shapes. In llm, head dimension D is usually small (e.g., 128). Batch size B and sequence length S keep changing.

Shape: [B, H, S, D]

q: [B, 32, S, 128]
kv: [B, 32, S, 128]

Command: python3 run.py --op rope --metrics cuda_time
image

@BoyuanFeng BoyuanFeng marked this pull request as draft August 20, 2025 22:57
@meta-cla meta-cla bot added the cla signed label Aug 20, 2025
@xuzhao9
Copy link
Contributor

xuzhao9 commented Aug 21, 2025

The fp8_gemm CI failure should be unrelated. It seems to be inductor issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants