feat(taskworker): Add compression in taskworker producer and worker #95153

enochtangg · 2025-07-09T18:46:59Z

The taskbroker system does not handle large payloads well. We first experienced the effects of this with ingest-profiles when rolling out to smaller environments. We also observed slow throughput and sqlite issues for processing pools handling sentryapp and seer tasks (2-8MB payloads). This PR is responsible for adding the ability to enable task parameter compression in taskworkers.

Flow:

User defines a CompressionType attribute on the @instrumented_task decorator which chooses the compression algorithm (only supports ZSTD and PLAINTEXT). Defaults to PLAINTEXT.
In the taskworker producer layer, parameters gets compression and serialized. A task header is added indicating the compression type
Parameters stay compressed in kafka and taskbroker storage.
Using the tasks header, the worker determines whether the task needs to be decompressed. If so, base64 decode and decompress the message.

Rollout:
To rollout this change, we update a task's decorator with CompressionType, then update the sentry option's compression rollout rate. Depending on the rate, tasks will be sampled for compression. By using the task header, this enables same tasks to be incrementally rolled out.

Testing:

Local testing
Unit tests
Will start with 1% of a single task in S4S. Observe compression metrics (duration and size) to ensure similar performance.

cursor

Bug: Compression Rollout Bug Causes UnboundLocalError

An UnboundLocalError occurs for the parameters_str variable when compression_type is ZSTD but the compression rollout condition is not met. The code's else block only assigns parameters_str when compression_type is not ZSTD, leaving parameters_str undefined for tasks configured with ZSTD compression that do not pass the rollout check.

src/sentry/taskworker/task.py#L180-L207

sentry/src/sentry/taskworker/task.py

Lines 180 to 207 in fd30ab8

    
           parameters_json = orjson.dumps({"args": args, "kwargs": kwargs}) 
        
           if self.compression_type == CompressionType.ZSTD: 
        
               option_flag = f"taskworker.{self._namespace.name}.compression.rollout" 
        
               compression_rollout_rate = options.get(option_flag) 
        
               # TODO(taskworker): This option is for added safety and requires a rollout 
        
               # percentage to be set to actually use compression. 
        
               # We can remove this later. 
        
               if compression_rollout_rate and compression_rollout_rate > random.random(): 
        
                   # Worker uses this header to determine if the parameters are decompressed 
        
                   headers["compression-type"] = CompressionType.ZSTD.value 
        
                   start_time = time.perf_counter() 
        
                   parameters_data = zstd.compress(parameters_json) 
        
                   # Compressed data is binary and needs base64 encoding for transport 
        
                   parameters_str = base64.b64encode(parameters_data).decode("utf8") 
        
                   end_time = time.perf_counter() 
        
                   metrics.distribution( 
        
                       "taskworker.producer.compressed_parameters_size", 
        
                       len(parameters_str), 
        
                       tags={"namespace": self._namespace.name, "taskname": self.name}, 
        
                   ) 
        
                   metrics.distribution( 
        
                       "taskworker.producer.compression_time", 
        
                       end_time - start_time, 
        
                       tags={"namespace": self._namespace.name, "taskname": self.name}, 
        
                   ) 
        
           else: 
        
               parameters_str = parameters_json.decode("utf8")

Fix in Cursor • Fix in Web

Bug: Task Name Conflict Overwrites Existing Task

The new simple_task_compressed function is registered with the name "examples.simple_task", which conflicts with an existing simple_task function. This overwrites the original task in the registry, rendering it unreachable. The new task requires a unique name.

src/sentry/taskworker/tasks/examples.py#L92-L96

sentry/src/sentry/taskworker/tasks/examples.py

Lines 92 to 96 in fd30ab8

    
           @exampletasks.register(name="examples.simple_task", compression_type=CompressionType.ZSTD) 
        
           def simple_task_compressed(*args: list[Any], **kwargs: dict[str, Any]) -> None: 
        
               sleep(0.1) 
        
               logger.debug("simple_task_compressed complete")

Fix in Cursor • Fix in Web

Was this report helpful? Give feedback by reacting with 👍 or 👎

evanh

This looks good to me. I'm interested to see how this will impact things.

markstory · 2025-07-09T21:37:21Z

src/sentry/options/defaults.py

+register(
+    "taskworker.deletions.compression.rollout",
+    default=0.0,
+    flags=FLAG_AUTOMATOR_MODIFIABLE,
+)


Do we need to enable compression per namespace? Once we know that the general compression flow works, we could opt-tasks into compression via deploys.

markstory · 2025-07-09T21:38:28Z

src/sentry/taskworker/registry.py

@@ -125,6 +126,7 @@ def wrapped(func: Callable[P, R]) -> Task[P, R]:
                ),
                at_most_once=at_most_once,
                wait_for_delivery=wait_for_delivery,
+                compression_type=compression_type,


Could we add this to the docstring so it shows up in LSP tooltips?

markstory · 2025-07-09T21:39:51Z

src/sentry/taskworker/task.py

@@ -169,12 +177,41 @@ def create_activation(
                    f"The `{key}` header value is of type {type(value)}"
                )

+        parameters_json = orjson.dumps({"args": args, "kwargs": kwargs})
+        if self.compression_type == CompressionType.ZSTD:
+            option_flag = f"taskworker.{self._namespace.name}.compression.rollout"


We could have a single option for enabling compression. We might also want the option check to be done with the self.compression_type comparison so that both the task has to have enabled compression and compression needs to be enabled.

markstory · 2025-07-09T21:54:48Z

tests/sentry/taskworker/test_worker.py

+
+@pytest.mark.django_db
+@mock.patch("sentry.taskworker.workerchild.capture_checkin")
+def test_child_process_decompression(mock_capture_checkin) -> None:


It would be good to have a task covering payload compression when an activation is created as well. There are some existing tests in tests/sentry/taskworker/test_task.py and test_registry.py covering activation creation.

enochtangg added 3 commits July 8, 2025 17:02

support zstd compression in taskworker

d4c7d34

wip

999fd80

add compression type

fd30ab8

enochtangg requested a review from a team as a code owner July 9, 2025 18:47

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 9, 2025

cursor bot reviewed Jul 9, 2025

View reviewed changes

evanh approved these changes Jul 9, 2025

View reviewed changes

markstory reviewed Jul 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(taskworker): Add compression in taskworker producer and worker #95153

feat(taskworker): Add compression in taskworker producer and worker #95153

Uh oh!

enochtangg commented Jul 9, 2025

Uh oh!

cursor bot left a comment

Uh oh!

evanh left a comment

Uh oh!

markstory Jul 9, 2025

Uh oh!

markstory Jul 9, 2025

Uh oh!

markstory Jul 9, 2025

Uh oh!

markstory Jul 9, 2025

Uh oh!

Uh oh!

	parameters_json = orjson.dumps({"args": args, "kwargs": kwargs})
	if self.compression_type == CompressionType.ZSTD:
	option_flag = f"taskworker.{self._namespace.name}.compression.rollout"
	compression_rollout_rate = options.get(option_flag)
	# TODO(taskworker): This option is for added safety and requires a rollout
	# percentage to be set to actually use compression.
	# We can remove this later.
	if compression_rollout_rate and compression_rollout_rate > random.random():
	# Worker uses this header to determine if the parameters are decompressed
	headers["compression-type"] = CompressionType.ZSTD.value
	start_time = time.perf_counter()
	parameters_data = zstd.compress(parameters_json)
	# Compressed data is binary and needs base64 encoding for transport
	parameters_str = base64.b64encode(parameters_data).decode("utf8")
	end_time = time.perf_counter()

	metrics.distribution(
	"taskworker.producer.compressed_parameters_size",
	len(parameters_str),
	tags={"namespace": self._namespace.name, "taskname": self.name},
	)
	metrics.distribution(
	"taskworker.producer.compression_time",
	end_time - start_time,
	tags={"namespace": self._namespace.name, "taskname": self.name},
	)
	else:
	parameters_str = parameters_json.decode("utf8")


	@exampletasks.register(name="examples.simple_task", compression_type=CompressionType.ZSTD)
	def simple_task_compressed(args: list[Any], *kwargs: dict[str, Any]) -> None:
	sleep(0.1)
	logger.debug("simple_task_compressed complete")

Uh oh!

feat(taskworker): Add compression in taskworker producer and worker #95153

Are you sure you want to change the base?

feat(taskworker): Add compression in taskworker producer and worker #95153

Uh oh!

Conversation

enochtangg commented Jul 9, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Compression Rollout Bug Causes UnboundLocalError

Bug: Task Name Conflict Overwrites Existing Task

Uh oh!

evanh left a comment

Choose a reason for hiding this comment

Uh oh!

markstory Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

markstory Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

markstory Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

markstory Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!