[Feature][AutoDeploy] Update sequence info to also carry other metadata about the request such as number of prefill and decode requests

### 🚀 The feature, motivation and pitch

In some cases (eg. cudagraph capture for attention), it is important to determine the number of prefill and decode requests in the batch. This information is also useful in launching kernels specialized for prefill and decode. To extract this information from a batch requires operations that are usually not compatible with graph capture. eg:
```
prefill_mask = seq_len > 1
num_prefill = int(prefill_mask.sum().item())
```
Instead, the logic to compute number of prefill and decode requests can be executed ahead of graph launch and stored in sequence info object.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature][AutoDeploy] Update sequence info to also carry other metadata about the request such as number of prefill and decode requests #8032

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature][AutoDeploy] Update sequence info to also carry other metadata about the request such as number of prefill and decode requests #8032

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions