-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
Closed
Labels
Description
Feature or enhancement
Proposal:
I propose adding a PyBytesWriter API to create bytes
objects.
- Efficient API thanks to overallocation in
PyBytesWriter_Extend()
and usage of a "small buffer" of (around) 256 bytes - Avoid creating incomplete/inconsistent
bytes
objects. - Avoid mutating immutable
bytes
.
API:
typedef struct PyBytesWriter PyBytesWriter;
PyAPI_FUNC(void*) PyBytesWriter_Create(
PyBytesWriter **writer,
Py_ssize_t alloc);
PyAPI_FUNC(void) PyBytesWriter_Discard(
PyBytesWriter *writer);
PyAPI_FUNC(PyObject*) PyBytesWriter_Finish(
PyBytesWriter *writer,
void *buf);
PyAPI_FUNC(Py_ssize_t) PyBytesWriter_GetRemaining(
PyBytesWriter *writer,
void *buf);
PyAPI_FUNC(void*) PyBytesWriter_Extend(
PyBytesWriter *writer,
void *buf,
Py_ssize_t extend);
PyAPI_FUNC(void*) PyBytesWriter_WriteBytes(
PyBytesWriter *writer,
void *buf,
const void *bytes,
Py_ssize_t size);
PyAPI_FUNC(void*) PyBytesWriter_Format(
PyBytesWriter *writer,
void *buf,
const char *format,
...);
Simple example creating the string b"abc"
:
PyObject* create_abc(void)
{
PyBytesWriter *writer;
char *str = PyBytesWriter_Create(&writer, 3);
if (writer == NULL) return NULL;
memcpy(str, "abc", 3);
str += 3;
return PyBytesWriter_Finish(writer, str);
}
Example formatting an integer in decimal, the size is not known in advance::
PyObject* format_int(int value)
{
PyBytesWriter *writer;
char *str = PyBytesWriter_Create(&writer, 20);
if (writer == NULL) return NULL;
str += PyOS_snprintf(str, 20, "%i", value);
return PyBytesWriter_Finish(writer, str);
}
Note: using PyBytesWriter_Format()
would make this code simpler.
Example using PyBytesWriter_Extend()
,smilar to bytes.center()
with a different API: spaces are number of whitespaces added to the left and to the right:
static PyObject *
byteswriter_center_example(Py_ssize_t spaces, char *str, Py_ssize_t str_size)
{
PyBytesWriter *writer;
char *buf = PyBytesWriter_Create(&writer, spaces * 2);
if (buf == NULL) {
goto error;
}
assert(PyBytesWriter_GetRemaining(writer, buf) == spaces * 2);
// Add left spaces
memset(buf, ' ', spaces);
buf += spaces;
assert(PyBytesWriter_GetRemaining(writer, buf) == spaces);
// Copy string
buf = PyBytesWriter_Extend(writer, buf, str_size);
if (buf == NULL) {
goto error;
}
assert(PyBytesWriter_GetRemaining(writer, buf) == spaces + str_size);
memcpy(buf, str, str_size);
buf += str_size;
assert(PyBytesWriter_GetRemaining(writer, buf) == spaces);
// Add right spaces
memset(buf, ' ', spaces);
buf += spaces;
assert(PyBytesWriter_GetRemaining(writer, buf) == 0);
return PyBytesWriter_Finish(writer, buf);
error:
PyBytesWriter_Discard(writer);
return NULL;
}
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
My previous attempt in July/August 2024:
Linked PRs
- [WIP] gh-129813: Add PyBytesWriter C API #129814
- [WIP] gh-129813: Add PyBytesWriter C API (version 2) #131520
- [WIP] gh-129813, PEP 782: Add PyBytesWriter C API #131681
- gh-129813, PEP 782: Add PyBytesWriter C API #138822
- gh-129813, PEP 782: Add PyBytesWriter_Format() #138824
- gh-129813, PEP 782: Use PyBytesWriter in binascii #138825
- gh-129813, PEP 782: Use PyBytesWriter in posix extension #138829
- gh-129813, PEP 782: Use Py_GetConstant(Py_CONSTANT_EMPTY_BYTES) #138830
- gh-129813, PEP 782: Use PyBytesWriter in socket and mmap #138831
- gh-129813, PEP 782: Use PyBytesWriter in lzma and zlib #138832
- gh-129813, PEP 782: Use PyBytesWriter in pickle and struct #138833
- gh-129813, PEP 782: Use PyBytesWriter in _hashopenssl #138835
- gh-129813, PEP 782: Use PyBytesWriter in memoryview #138836
- gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_FromList() #138837
- gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_DecodeEscape2() #138838
- gh-129813, PEP 782: Use PyBytesWriter in _PyBytes_FormatEx() #138839
- gh-129813, PEP 782: Use PyBytesWriter in bytes_concat() #138840
- gh-129813, PEP 782: Use PyBytesWriter in utf8_encoder() #138874
- gh-129813, PEP 782: Use PyBytesWriter in _io.FileIO.readall #138901
- gh-129813, PEP 782: Use PyBytesWriter in _codecs.escape_decode() #138919
- gh-129813, PEP 782: Use PyBytesWriter in _curses #138920
- gh-129813, PEP 782: Use PyBytesWriter in fcntl #138921
- gh-129813, PEP 782: Use PyBytesWriter in _hashopenssl #138922
- gh-129813, PEP 782: Use PyBytesWriter in _sha3 #138923
- gh-129813, PEP 782: Init small_buffer in PyBytesWriter_Create() #138924
- gh-129813, PEP 782: Use PyBytesWriter in _ssl #138929
- gh-129813, PEP 782: Use PyBytesWriter in _winapi.PeekNamedPipe() #138930
- gh-129813, PEP 782: Add PyBytesWriter.overallocate #138941
- gh-129813, PEP 782: Use PyBytesWriter in bufferedio.c #138954
- gh-129813, PEP 782: Use PyBytesWriter in FileIO.read() #138955
- gh-129813: Use
PyBytesWriter
in_json:_match_number_unicode
#138957 - gh-129813, PEP 782: Soft deprecate _PyBytes_Resize() #138964
- gh-129813, PEP 782: Add doc reference link #138986
- gh-129813, PEP 782: Use PyBytesWriter in _Py_bytes_maketrans() #139044
- gh-129813, PEP 782: Use PyBytesWriter in _multiprocessing #139047
- gh-129813, PEP 782: Use PyBytesWriter in _testclinic #139048
- gh-129813, PEP 782: Set invalid bytes in PyBytesWriter #139054
- gh-129813, PEP 782: Use PyBytesWriter in _socket #139097
- gh-129813, PEP 782: Optimize byteswriter_resize() #139101
- gh-129813, PEP 782: Use PyBytesWriter in ssl.MemoryBIO #139113
- gh-129813: PEP 782: Use PyBytesWriter in _sqlite #138956
- gh-129813, PEP 782: Use PyBytesWriter in bufferedio.c #139121
- gh-129813, PEP 782: Use PyBytesWriter in socket recvmsg() #139131
asvetlov and cmaloney