json: C decoder reports "Invalid \uXXXX escape" instead of "Unterminated string starting at" for a complete \uXXXX escape at end of input

The C accelerator of `json` reports `Invalid \uXXXX escape` where the pure-Python decoder reports `Unterminated string starting at`, for a `\uXXXX` escape whose final hex digit is the last character of the input.

### Minimal repro

The input is an opening quote followed by a complete `A` escape and **no** closing quote (7 characters: `"`, `\`, `u`, `0`, `0`, `4`, `1`):

```python
>>> import json
>>> json.loads(r'"\u0041')           # C accelerator
Traceback (most recent call last):
  ...
json.decoder.JSONDecodeError: Invalid \uXXXX escape: line 1 column 3 (char 2)

>>> from json import decoder
>>> decoder.py_scanstring(r'"\u0041', 1)   # pure-Python
Traceback (most recent call last):
  ...
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 1 (char 0)
```

The four hex digits `0041` form a complete, valid `\uXXXX` escape, so the string is merely *unterminated*. The pure-Python decoder diagnoses this correctly; the C accelerator misreports it as an *invalid escape*.

### Root cause

`Modules/_json.c`, in `scanstring_unicode()`:

```c
end = next + 4;
if (end >= len) {
    raise_errmsg("Invalid \\uXXXX escape", pystr, next - 1);
    goto bail;
}
```

The four hex digits are read at indices `next .. next+3`, i.e. `end-4 .. end-1`. When `end == len` those indices are `len-4 .. len-1` — all in bounds — so the escape is complete and valid; the input is simply unterminated. The check should be `>` rather than `>=`: only `end > len` means the escape itself runs past the end of the input. At `end == len` control should fall through, the forward scan finds no closing quote, and the existing `raise_errmsg("Unterminated string starting at", ...)` fires, matching the pure-Python decoder exactly.

### Why this matters

This is a C-accelerator vs pure-Python parity / wrong-diagnostic bug: the two decoders disagree on the error class for the same input, and the C one points the user at the wrong problem (a non-existent invalid escape rather than the missing closing quote). The `json` C and pure-Python error messages were deliberately synchronized in bpo-5067, so divergence here regresses that contract.

This is distinct from gh-125660 (which is about the pure-Python decoder *accepting* invalid unicode escapes); here both decoders correctly reject genuinely truncated escapes, and the only divergence is the misclassification of a *complete* escape at end-of-input.

I have a one-character fix (`>=` → `>`) plus regression tests covering both the C and pure-Python decoders ready to submit.



### Linked PRs
* gh-152053

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

json: C decoder reports "Invalid \uXXXX escape" instead of "Unterminated string starting at" for a complete \uXXXX escape at end of input #152052

Minimal repro

Root cause

Why this matters

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

json: C decoder reports "Invalid \uXXXX escape" instead of "Unterminated string starting at" for a complete \uXXXX escape at end of input #152052

Description

Minimal repro

Root cause

Why this matters

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions