Make `_get_perspective_coeffs` device agnostic

### 🐛 Describe the bug

Currently, `_get_perspective_coeffs` creates internal tensors on the CPU as seen [here](https://github.com/pytorch/vision/blob/d02b1845a2fabea1eb8f9d09310369a5cbb5514f/torchvision/transforms/functional.py#L674-L704) and fails in the `torch.linalg.lstsq` call if `start/endpoints` is a tensor contain data on the GPU (or another device).

The docs explain a `list of list of python:ints` is expected, but still tensors are allowed and do not fail.

A fix would be to create `a_matrix` using the `device` attribute of the input. An alternative would be to move the points to the host, but this would sync the code and disallow graph capture or to error out if tensors are passed.

If we want to accept tensor inputs, the tensor clone should also be fixed `b_matrix = torch.tensor(startpoints, dtype=torch.float64).view(8)`.

Original error reported in the [discussion board](https://discuss.pytorch.org/t/cannot-run-torchvision-transforms-functional-perspective-on-gpu/220140) and reproduced using:
```python
import torch
import torchvision.transforms.functional as TF


device = "cuda"
reference_image = torch.randn(1, 3, 224, 224, device=device)
B, C, H, W = reference_image.shape

W = 200
H = 200
# Define source points (original corners of the target image)
src_points = torch.tensor([
    [0, 0],  # Top-left
    [W - 1, 0],  # Top-right
    [W - 1, H - 1],  # Bottom-right
    [0, H - 1]  # Bottom-left
], dtype=torch.float32, device=reference_image.device)
src_points = src_points.unsqueeze(0).repeat(B, 1, 1)  # (B, 4, 2)

predicted_points = torch.tensor([
    [0, 0],  # Top-left
    [W - 10, 0],  # Top-right
    [W - 10, H - 10],  # Bottom-right
    [0, H - 10]  # Bottom-left
], dtype=torch.float32, device=reference_image.device)
predicted_points = predicted_points .unsqueeze(0).repeat(B, 1, 1)  # (B, 4, 2)

warped_images = []
for i in range(B):
    warped = TF.perspective(
        reference_image[i],
        src_points[i],
        predicted_points[i],
        interpolation=TF.InterpolationMode.BILINEAR,
        fill=0,
    )

# RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument b in method wrapper_CUDA_out_linalg_lstsq_out)```

### Versions


`0.22.0.dev20250404+cu128`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `_get_perspective_coeffs` device agnostic #9076

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make _get_perspective_coeffs device agnostic #9076

Description

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `_get_perspective_coeffs` device agnostic #9076