You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I probably observe a memory leak using kafka-python (2.0.1) package with PyPy (Python 3.6.9 (1608da62bfc7, Dec 23 2019, 10:50:04) [PyPy 7.3.0 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux.)
I run this script on CPython 3.6 (Python 3.6.4 (default, Mar 19 2019, 21:01:45)[GCC 4.9.2] on linux) and there was not any memory problem, it stopped at 20.1 MiB and it seemed stopped on this.
I attach a script that consumes data from empty topic, and it's memory consumption is increasing for a long time (I've been testing it for at least 3 hours and all this time it was eating more and more memory).
To be accurate, it starts nearly with 100 MiB and in 1-2 hours increases to ~300-400 MiB.
I tried to profile concrete internal methods in KafkaConsumer, such as KafkaConsumer.poll, I saw memory consumption at KafkaConsumer._poll_once, and then in KafkaConsumer._client.poll and KafkaConsumer._fetcher.fetched_records, but I can't find anything useful after it.
Also I tried to test it without memory_profiler module, just observing htop - the result remained the same.
I attached three screenshots: the first is script startup, the second after one minute of work and the remaining is after 12 minutes.
As you can see, some poll calls increase memory for 0.1-0.6 MiB (0.3 MiB is the most frequent value, it can be several MiB at startup, but I think it is ok).
import time
from kafka import KafkaConsumer
from kafka.structs import TopicPartition
from memory_profiler import profile
_bootstrap_servers = ["172.172.172.2:9092"]
_group_id = "memoryLeakGroup"
_auto_offset_reset = "earliest"
_enable_auto_commit = True
_timeout_ms_max = 5000
_batch_max_size = 10000
_assignment = [TopicPartition(topic="29_06_2020", partition=0)]
_origin_consumer = KafkaConsumer(
bootstrap_servers=_bootstrap_servers,
group_id=_group_id,
auto_offset_reset=_auto_offset_reset,
enable_auto_commit=_enable_auto_commit,
)
_origin_consumer.assign(_assignment)
@profile
def polling():
data = _origin_consumer.poll(
timeout_ms=_timeout_ms_max, max_records=_batch_max_size
)
if not data:
print(f"There is no more data {_assignment}")
else:
print(f"There is some data {data}")
if __name__ == "__main__":
while True:
try:
polling()
time.sleep(0.5)
except Exception:
print('wow exception')
I probably observe a memory leak using kafka-python (2.0.1) package with PyPy (Python 3.6.9 (1608da62bfc7, Dec 23 2019, 10:50:04) [PyPy 7.3.0 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux.)
I run this script on CPython 3.6 (Python 3.6.4 (default, Mar 19 2019, 21:01:45)[GCC 4.9.2] on linux) and there was not any memory problem, it stopped at 20.1 MiB and it seemed stopped on this.
I attach a script that consumes data from empty topic, and it's memory consumption is increasing for a long time (I've been testing it for at least 3 hours and all this time it was eating more and more memory).
To be accurate, it starts nearly with 100 MiB and in 1-2 hours increases to ~300-400 MiB.
I tried to profile concrete internal methods in KafkaConsumer, such as KafkaConsumer.poll, I saw memory consumption at KafkaConsumer._poll_once, and then in KafkaConsumer._client.poll and KafkaConsumer._fetcher.fetched_records, but I can't find anything useful after it.
Also I tried to test it without memory_profiler module, just observing htop - the result remained the same.
I attached three screenshots: the first is script startup, the second after one minute of work and the remaining is after 12 minutes.
As you can see, some
poll
calls increase memory for 0.1-0.6 MiB (0.3 MiB is the most frequent value, it can be several MiB at startup, but I think it is ok).I run my scripts in docker, PyPy and Kafka images can be built with the files in attachments (remove .txt extensions from boot.sh and Dockerfile)
boot.sh.txt
kafka_Dockerfile.txt
pypy_requirements.txt
pypy_Dockerfile.txt
I've also created an issue on PyPy repo
The text was updated successfully, but these errors were encountered: