-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
[C API] PEP 782: Add PyBytesWriter API #129813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
* Replace usage of the old private _PyBytesWriter with the new public PyBytesWriter C API. * Remove the old private _PyBytesWriter C API. * Add a freelist for PyBytesWriter_Create(). * TODO: write doc * TODO: document new functions in What's New and Changelog
The new API is similar to the old one, but I renamed functions, changed arguments order, added |
Thank you! This looks great :) I have a few suggestions for your consideration, purely from the API-surface point of view:
Consider
Consider returning the writer, and having
This is meant to be called as
It seems something like
It might be more useful to add |
For
For example, Another full example from static PyObject *
raw_unicode_escape(PyObject *obj)
{
Py_ssize_t size = PyUnicode_GET_LENGTH(obj);
const void *data = PyUnicode_DATA(obj);
int kind = PyUnicode_KIND(obj);
PyBytesWriter *writer;
char *p = PyBytesWriter_Create(&writer, size);
if (p == NULL) {
return NULL;
}
for (Py_ssize_t i=0; i < size; i++) {
Py_UCS4 ch = PyUnicode_READ(kind, data, i);
/* Map 32-bit characters to '\Uxxxxxxxx' */
if (ch >= 0x10000) {
/* -1: subtract 1 preallocated byte */
p = PyBytesWriter_Extend(writer, p, 10-1);
if (p == NULL)
goto error;
*p++ = '\\';
*p++ = 'U';
*p++ = Py_hexdigits[(ch >> 28) & 0xf];
*p++ = Py_hexdigits[(ch >> 24) & 0xf];
*p++ = Py_hexdigits[(ch >> 20) & 0xf];
*p++ = Py_hexdigits[(ch >> 16) & 0xf];
*p++ = Py_hexdigits[(ch >> 12) & 0xf];
*p++ = Py_hexdigits[(ch >> 8) & 0xf];
*p++ = Py_hexdigits[(ch >> 4) & 0xf];
*p++ = Py_hexdigits[ch & 15];
}
/* Map 16-bit characters, '\\' and '\n' to '\uxxxx' */
else if (ch >= 256 ||
ch == '\\' || ch == 0 || ch == '\n' || ch == '\r' ||
ch == 0x1a)
{
/* -1: subtract 1 preallocated byte */
p = PyBytesWriter_Extend(writer, p, 6-1);
if (p == NULL)
goto error;
*p++ = '\\';
*p++ = 'u';
*p++ = Py_hexdigits[(ch >> 12) & 0xf];
*p++ = Py_hexdigits[(ch >> 8) & 0xf];
*p++ = Py_hexdigits[(ch >> 4) & 0xf];
*p++ = Py_hexdigits[ch & 15];
}
/* Copy everything else as-is */
else
*p++ = (char) ch;
}
return PyBytesWriter_Finish(writer, p);
error:
PyBytesWriter_Discard(writer);
return NULL;
} Sometimes, you don't know how many bytes were allocated and you don't care. Just call I'm not sure in which case a |
I'm afraid of strict aliasing issues. For example, if Extend() signature is changed to: PyAPI_FUNC(int) PyBytesWriter_Extend(
PyBytesWriter *writer,
void **buf,
Py_ssize_t extend); I'm afraid of compiler issues with the cast: char *buf;
(...)
if (PyBytesWriter_Extend(writer, &buf, 100) < 0) { /* handle error */ }
I had issues similar to that in the past, sadly I forgot details. |
Ah, I see,
I don't see strict aliasing issues here. Were the issues you saw related to casting between |
I removed |
A "recent" example of type punning / strict aliasing issue with Py_CLEAR()/Py_SETREF(): #99701 These macros take a pointer to a pointer to any object (ex: |
It would be inconvenient. If
You cannot pass Also, as written before, I'm afraid of type punning / strict aliasing issues if |
Ok, I changed bytes parameter type to |
I created a discussion: https://discuss.python.org/t/add-pybyteswriter-public-c-api/81182 |
It seems like most developers are confused by the API which requires to pass writer and buf to most functions. I abandon this API. |
Add functions: * PyBytesWriter_Create() * PyBytesWriter_Discard() * PyBytesWriter_Finish() * PyBytesWriter_Alloc() * PyBytesWriter_Extend() * PyBytesWriter_Truncate() * PyBytesWriter_WriteBytes()
Add functions: * PyBytesWriter_Create() * PyBytesWriter_Discard() * PyBytesWriter_Finish() * PyBytesWriter_FinishWithSize() * PyBytesWriter_FinishWithEndPointer() * PyBytesWriter_Data() * PyBytesWriter_Allocated() * PyBytesWriter_SetSize() * PyBytesWriter_Resize()
Add functions: * PyBytesWriter_Create() * PyBytesWriter_Discard() * PyBytesWriter_Finish() * PyBytesWriter_FinishWithSize() * PyBytesWriter_FinishWithEndPointer() * PyBytesWriter_Data() * PyBytesWriter_Allocated() * PyBytesWriter_SetSize() * PyBytesWriter_Resize()
Add functions: * PyBytesWriter_Create() * PyBytesWriter_Discard() * PyBytesWriter_Finish() * PyBytesWriter_FinishWithSize() * PyBytesWriter_FinishWithEndPointer() * PyBytesWriter_Data() * PyBytesWriter_Allocated() * PyBytesWriter_SetSize() * PyBytesWriter_Resize()
Feature or enhancement
Proposal:
I propose adding a PyBytesWriter API to create
bytes
objects.PyBytesWriter_Extend()
and usage of a "small buffer" of (around) 256 bytesbytes
objects.bytes
.API:
Simple example creating the string
b"abc"
:Example formatting an integer in decimal, the size is not known in advance::
Note: using
PyBytesWriter_Format()
would make this code simpler.Example using
PyBytesWriter_Extend()
,smilar tobytes.center()
with a different API: spaces are number of whitespaces added to the left and to the right:Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
My previous attempt in July/August 2024:
Linked PRs
The text was updated successfully, but these errors were encountered: