Skip to content

Commit 0701855

Browse files
gaogaotiantiandongjoon-hyun
authored andcommitted
[SPARK-54619][PYTHON] Add a sanity check for configuration numbers
### What changes were proposed in this pull request? Add sanity check for number of configurations being passed. ### Why are the changes needed? This is helpful to recognize malformed message - avoid potential deadlock when the message does not conform to protocol. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This error should not happen and it should not break CI either. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53359 from gaogaotiantian/runner-conf-sanity-check. Authored-by: Tian Gao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent e934c43 commit 0701855

File tree

2 files changed

+18
-1
lines changed

2 files changed

+18
-1
lines changed

python/pyspark/errors/error-conditions.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -941,6 +941,11 @@
941941
"Argument <arg_name> must be a numerical column for plotting, got <arg_type>."
942942
]
943943
},
944+
"PROTOCOL_ERROR": {
945+
"message": [
946+
"<failure>. This usually indicates that the message does not conform to the protocol."
947+
]
948+
},
944949
"PYTHON_HASH_SEED_NOT_SET": {
945950
"message": [
946951
"Randomness of hash of string should be disabled via PYTHONHASHSEED."

python/pyspark/worker.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,19 @@ def __init__(self, infile=None):
118118

119119
def load(self, infile):
120120
num_conf = read_int(infile)
121-
for i in range(num_conf):
121+
# We do a sanity check here to reduce the possibility to stuck indefinitely
122+
# due to an invalid messsage. If the numer of configurations is obviously
123+
# wrong, we just raise an error directly.
124+
# We hand-pick the configurations to send to the worker so the number should
125+
# be very small (less than 100).
126+
if num_conf < 0 or num_conf > 10000:
127+
raise PySparkRuntimeError(
128+
errorClass="PROTOCOL_ERROR",
129+
messageParameters={
130+
"failure": f"Invalid number of configurations: {num_conf}",
131+
},
132+
)
133+
for _ in range(num_conf):
122134
k = utf8_deserializer.loads(infile)
123135
v = utf8_deserializer.loads(infile)
124136
self._conf[k] = v

0 commit comments

Comments
 (0)