Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The serialization layer is unexpectedly processed before the producer's partitioning logic #1663

Open
jeffwidman opened this issue Nov 29, 2018 · 1 comment
Labels

Comments

@jeffwidman
Copy link
Contributor

jeffwidman commented Nov 29, 2018

sample code pulled from one of our internal applications:

# kafka_producer is configured with:
#    "key_serializer": json.dumps,
#    "value_serializer": json.dumps,

key = None  # None produces round-robin
if Const.FIELD_USER in message:
    key = message[Const.FIELD_USER]
kafka_producer.send(topic, key=key, value=message)

Unsurprisingly, using json.dumps will serialize key=None to 'null'.

Surprisingly, this results in key=None behaving as if it were a keyed message and always being sent to a single partition rather than round-robining.

This is because the serialization layer is processed before the partitioning logic. So by the time https://github.com/dpkp/kafka-python/blob/1.4.4/kafka/partitioner/default.py#L24 is hit, the key is already the string 'null'.

I found this extremely surprising... at a minimum we need to call this out in the docs.

Alternatively, we could offer default helpers that handle null keys/values (for deleting messages in compacted topics) in a less surprising way.

Related: #913.

@jeffwidman jeffwidman changed the title Using the serializer recommended by the docs breaks producing with key=None The serialization layer is processed before the producer's partitioning logic Nov 29, 2018
@jeffwidman jeffwidman changed the title The serialization layer is processed before the producer's partitioning logic The serialization layer is unexpectedly processed before the producer's partitioning logic Nov 29, 2018
@tvoinarovskyi
Copy link
Collaborator

Hmm, yes, totally agree, at least a set of helpers would be good. I still find it kind of strange that None is even passed to the serializer, not just left as None. But we can't just change that at this point.

StringSerializer and JsonSerializer would at least be great. There is still an open question about what do we do with headers. Java has calls to serialize those too.

@dpkp dpkp added the producer label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants