Skip to content

[C API] PEP 782: Add PyBytesWriter API #129813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vstinner opened this issue Feb 7, 2025 · 11 comments
Closed

[C API] PEP 782: Add PyBytesWriter API #129813

vstinner opened this issue Feb 7, 2025 · 11 comments
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Feb 7, 2025

Feature or enhancement

Proposal:

I propose adding a PyBytesWriter API to create bytes objects.

API:

typedef struct PyBytesWriter PyBytesWriter;

PyAPI_FUNC(void*) PyBytesWriter_Create(
    PyBytesWriter **writer,
    Py_ssize_t alloc);
PyAPI_FUNC(void) PyBytesWriter_Discard(
    PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
    PyBytesWriter *writer,
    void *buf);

PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetRemaining(
    PyBytesWriter *writer,
    void *buf);
PyAPI_FUNC(void*) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    void *buf,
    Py_ssize_t extend);
PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    void *buf,
    const void *bytes,
    Py_ssize_t size);
PyAPI_FUNC(void*) PyBytesWriter_Format(
    PyBytesWriter *writer,
    void *buf,
    const char *format,
    ...);

Simple example creating the string b"abc":

PyObject* create_abc(void)
{
    PyBytesWriter *writer;
    char *str = PyBytesWriter_Create(&writer, 3);
    if (writer == NULL) return NULL;

    memcpy(str, "abc", 3);
    str += 3;

    return PyBytesWriter_Finish(writer, str);
}

Example formatting an integer in decimal, the size is not known in advance::

PyObject* format_int(int value)
{
    PyBytesWriter *writer;
    char *str = PyBytesWriter_Create(&writer, 20);
    if (writer == NULL) return NULL;

    str += PyOS_snprintf(str, 20, "%i", value);

    return PyBytesWriter_Finish(writer, str);
}

Note: using PyBytesWriter_Format() would make this code simpler.

Example using PyBytesWriter_Extend(),smilar to bytes.center() with a different API: spaces are number of whitespaces added to the left and to the right:

static PyObject *
byteswriter_center_example(Py_ssize_t spaces, char *str, Py_ssize_t str_size)
{
    PyBytesWriter *writer;
    char *buf = PyBytesWriter_Create(&writer, spaces * 2);
    if (buf == NULL) {
        goto error;
    }
    assert(PyBytesWriter_GetRemaining(writer, buf) == spaces * 2);

    // Add left spaces
    memset(buf, ' ', spaces);
    buf += spaces;
    assert(PyBytesWriter_GetRemaining(writer, buf) == spaces);

    // Copy string
    buf = PyBytesWriter_Extend(writer, buf, str_size);
    if (buf == NULL) {
        goto error;
    }
    assert(PyBytesWriter_GetRemaining(writer, buf) == spaces + str_size);

    memcpy(buf, str, str_size);
    buf += str_size;
    assert(PyBytesWriter_GetRemaining(writer, buf) == spaces);

    // Add right spaces
    memset(buf, ' ', spaces);
    buf += spaces;
    assert(PyBytesWriter_GetRemaining(writer, buf) == 0);

    return PyBytesWriter_Finish(writer, buf);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

My previous attempt in July/August 2024:

Linked PRs

@vstinner vstinner added the type-feature A feature request or enhancement label Feb 7, 2025
vstinner added a commit to vstinner/cpython that referenced this issue Feb 7, 2025
* Replace usage of the old private _PyBytesWriter with the new public
  PyBytesWriter C API.
* Remove the old private _PyBytesWriter C API.
* Add a freelist for PyBytesWriter_Create().
* TODO: write doc
* TODO: document new functions in What's New and Changelog
@vstinner
Copy link
Member Author

vstinner commented Feb 7, 2025

My previous attempt in July/August 2024: #121726

The new API is similar to the old one, but I renamed functions, changed arguments order, added PyBytesWriter_Format() and PyBytesWriter_GetAllocated() functions, etc.

@encukou
Copy link
Member

encukou commented Feb 7, 2025

Thank you! This looks great :)

I have a few suggestions for your consideration, purely from the API-surface point of view:

PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    void *buf,
    const char *bytes,
    Py_ssize_t size);

Consider void *bytes (or uint8_t *bytes), per the WIP guidelines.

PyAPI_FUNC(void*) PyBytesWriter_Create(
    PyBytesWriter **writer,
    Py_ssize_t alloc);

Consider returning the writer, and having void **buf as an output argument, for consistency with other *_Create/*_Finish/*_Discard families.

PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(
    PyBytesWriter *writer,
    void *buf,
    const char *bytes,
    Py_ssize_t size);

This is meant to be called as buf = PyBytesWriter_WriteBytes(..., buf, ...), right?
Could it take a void **buf and update it?
(Same for PyBytesWriter_Format, PyBytesWriter_Extend)

PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetAllocated(
    PyBytesWriter *writer);

It seems something like PyBytesWriter_HowMuchMoreCanIWrite(PyBytesWriter *writer, void *buf) would be more useful -- i.e. get the space that's available after buf, rather than the total.

PyAPI_FUNC(void*) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    void *buf,
    Py_ssize_t extend);

It might be more useful to add PyBytesWriter_Reserve, with the same signature but ensuring extend bytes after buf are writable.
(If you use the PyBytesWriter_Create & str += PyOS_snprintf pattern you show, you don't really care about any previous overallocation.)

@vstinner
Copy link
Member Author

For PyBytesWriter_Extend(), you usually don't want to forget previous PyBytesWriter_Create() and PyBytesWriter_Extend() calls.

// allocate 100 bytes
buf = PyBytesWriter_Create(&writer, 100);

(... write a few bytes ...)

// make room to write 5 bytes at the current position
// allocate 100+5 bytes (+ overallocation)
buf = PyBytesWriter_Extend(writer, buf, 5);

(... write more bytes ...)

For example, PyBytes_FromFormatV(format) uses such pattern: it allocates strlen(format), and then call PyBytesWriter_Extend() if it's not enough. Copying each format byte is safe since you have at least strlen(format) bytes, but some formats need extra space and here comes PyBytesWriter_Extend().

Another full example from _pickle.c:

static PyObject *
raw_unicode_escape(PyObject *obj)
{
    Py_ssize_t size = PyUnicode_GET_LENGTH(obj);
    const void *data = PyUnicode_DATA(obj);
    int kind = PyUnicode_KIND(obj);

    PyBytesWriter *writer;
    char *p = PyBytesWriter_Create(&writer, size);
    if (p == NULL) {
        return NULL;
    }

    for (Py_ssize_t i=0; i < size; i++) {
        Py_UCS4 ch = PyUnicode_READ(kind, data, i);
        /* Map 32-bit characters to '\Uxxxxxxxx' */
        if (ch >= 0x10000) {
            /* -1: subtract 1 preallocated byte */
            p = PyBytesWriter_Extend(writer, p, 10-1);
            if (p == NULL)
                goto error;

            *p++ = '\\';
            *p++ = 'U';
            *p++ = Py_hexdigits[(ch >> 28) & 0xf];
            *p++ = Py_hexdigits[(ch >> 24) & 0xf];
            *p++ = Py_hexdigits[(ch >> 20) & 0xf];
            *p++ = Py_hexdigits[(ch >> 16) & 0xf];
            *p++ = Py_hexdigits[(ch >> 12) & 0xf];
            *p++ = Py_hexdigits[(ch >> 8) & 0xf];
            *p++ = Py_hexdigits[(ch >> 4) & 0xf];
            *p++ = Py_hexdigits[ch & 15];
        }
        /* Map 16-bit characters, '\\' and '\n' to '\uxxxx' */
        else if (ch >= 256 ||
                 ch == '\\' || ch == 0 || ch == '\n' || ch == '\r' ||
                 ch == 0x1a)
        {
            /* -1: subtract 1 preallocated byte */
            p = PyBytesWriter_Extend(writer, p, 6-1);
            if (p == NULL)
                goto error;

            *p++ = '\\';
            *p++ = 'u';
            *p++ = Py_hexdigits[(ch >> 12) & 0xf];
            *p++ = Py_hexdigits[(ch >> 8) & 0xf];
            *p++ = Py_hexdigits[(ch >> 4) & 0xf];
            *p++ = Py_hexdigits[ch & 15];
        }
        /* Copy everything else as-is */
        else
            *p++ = (char) ch;
    }

    return PyBytesWriter_Finish(writer, p);

error:
    PyBytesWriter_Discard(writer);
    return NULL;
}

Sometimes, you don't know how many bytes were allocated and you don't care. Just call PyBytesWriter_Extend(size) to write size bytes. Example: pylong_int_to_decimal_string() in longobject.c, it gets a writer and just need to extend to write a few bytes.

I'm not sure in which case a PyBytesWriter_Reserve() function would be useful. At least, I don't think that it's currently needed in the Python code base (with my PR). At the same time, I'm not opposed to add it if it's useful :-)

@vstinner
Copy link
Member Author

Consider returning the writer, and having void **buf as an output argument

I'm afraid of strict aliasing issues.

For example, if Extend() signature is changed to:

PyAPI_FUNC(int) PyBytesWriter_Extend(
    PyBytesWriter *writer,
    void **buf,
    Py_ssize_t extend);

I'm afraid of compiler issues with the cast:

char *buf;
(...)
if (PyBytesWriter_Extend(writer, &buf, 100) < 0) { /* handle error */ }

char* is cast to void*, maybe the compiler will consider that buf cannot be modified by PyBytesWriter_Extend().

I had issues similar to that in the past, sadly I forgot details.

@encukou
Copy link
Member

encukou commented Feb 12, 2025

For PyBytesWriter_Extend(), you usually don't want to forget previous PyBytesWriter_Create() and PyBytesWriter_Extend() calls.

Ah, I see, PyBytesWriter_Extend is useful and PyBytesWriter_Reserve can be added later.
As long as the documentation clearly says what the function does, that sounds good.

I'm afraid of strict aliasing issues.

I don't see strict aliasing issues here. Were the issues you saw related to casting between char* and void*?

@vstinner
Copy link
Member Author

PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetAllocated(
PyBytesWriter *writer);

It seems something like PyBytesWriter_HowMuchMoreCanIWrite(PyBytesWriter *writer, void *buf) would be more useful -- i.e. get the space that's available after buf, rather than the total.

I removed PyBytesWriter_GetAllocated() and I added PyBytesWriter_GetRemaining(PyBytesWriter *writer, void *buf) instead.

@vstinner
Copy link
Member Author

I don't see strict aliasing issues here. Were the issues you saw related to casting between char* and void*?

A "recent" example of type punning / strict aliasing issue with Py_CLEAR()/Py_SETREF(): #99701 These macros take a pointer to a pointer to any object (ex: PyLongObject**) and cast it to a pointer to a pointer to a PyObject (PyObject**).

@vstinner
Copy link
Member Author

Consider returning the writer, and having void **buf as an output argument, for consistency with other _Create/_Finish/*_Discard families.

It would be inconvenient. If &buf is passed as void** with char *buf, it fails with a compiler error:

error: passing argument 1 of 'PyBytesWriter_Create' from incompatible pointer type [-Wincompatible-pointer-types]

You cannot pass char** as void** without an explicit cast to void**.

Also, as written before, I'm afraid of type punning / strict aliasing issues if void** is used.

@vstinner
Copy link
Member Author

PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(..., const char *bytes, ...)
Consider void *bytes (or uint8_t *bytes), per the WIP guidelines.

Ok, I changed bytes parameter type to void*.

@vstinner
Copy link
Member Author

I created a discussion: https://discuss.python.org/t/add-pybyteswriter-public-c-api/81182

@vstinner
Copy link
Member Author

I created a discussion: https://discuss.python.org/t/add-pybyteswriter-public-c-api/81182

It seems like most developers are confused by the API which requires to pass writer and buf to most functions. I abandon this API.

vstinner added a commit to vstinner/cpython that referenced this issue Mar 20, 2025
Add functions:

* PyBytesWriter_Create()
* PyBytesWriter_Discard()
* PyBytesWriter_Finish()
* PyBytesWriter_Alloc()
* PyBytesWriter_Extend()
* PyBytesWriter_Truncate()
* PyBytesWriter_WriteBytes()
vstinner added a commit to vstinner/cpython that referenced this issue Mar 24, 2025
Add functions:

* PyBytesWriter_Create()
* PyBytesWriter_Discard()
* PyBytesWriter_Finish()
* PyBytesWriter_FinishWithSize()
* PyBytesWriter_FinishWithEndPointer()
* PyBytesWriter_Data()
* PyBytesWriter_Allocated()
* PyBytesWriter_SetSize()
* PyBytesWriter_Resize()
vstinner added a commit to vstinner/cpython that referenced this issue Mar 24, 2025
Add functions:

* PyBytesWriter_Create()
* PyBytesWriter_Discard()
* PyBytesWriter_Finish()
* PyBytesWriter_FinishWithSize()
* PyBytesWriter_FinishWithEndPointer()
* PyBytesWriter_Data()
* PyBytesWriter_Allocated()
* PyBytesWriter_SetSize()
* PyBytesWriter_Resize()
vstinner added a commit to vstinner/cpython that referenced this issue Mar 24, 2025
Add functions:

* PyBytesWriter_Create()
* PyBytesWriter_Discard()
* PyBytesWriter_Finish()
* PyBytesWriter_FinishWithSize()
* PyBytesWriter_FinishWithEndPointer()
* PyBytesWriter_Data()
* PyBytesWriter_Allocated()
* PyBytesWriter_SetSize()
* PyBytesWriter_Resize()
@vstinner vstinner changed the title [C API] Add PyBytesWriter API [C API] PEP 782: Add PyBytesWriter API Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants