Commit 0e63b5d
Add EmptyMatchTreatment support for SMB reads (#5759)
This change adds comprehensive EmptyMatchTreatment support for Sort Merge Bucket (SMB) operations:
API-level changes:
- Add EmptyMatchTreatment parameter to all SMB IO read operations (JSON, TensorFlow, ParquetAvro, ParquetType)
- Update BucketedInput.of() methods to accept EmptyMatchTreatment parameter
- Maintain backward compatibility with existing code
End-to-end implementation:
- Fix BucketedInput#getOrSampleByteSize to handle empty directories gracefully when EmptyMatchTreatment.ALLOW is used
- Enhance MultiSourceKeyGroupReader to filter out sources without valid metadata, preventing failures with empty directories
- Add comprehensive tests for both API and integration functionality
This addresses the SMB filesystem abstraction bypasses by ensuring EmptyMatchTreatment works throughout the entire SMB pipeline.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>1 parent 34bd46d commit 0e63b5d
File tree
12 files changed
+398
-19
lines changed- scio-smb/src
- main
- java/org/apache/beam/sdk/extensions/smb
- scala/org/apache/beam/sdk/extensions/smb
- test
- java/org/apache/beam/sdk/extensions/smb
- scala
- com/spotify/scio/smb
- org/apache/beam/sdk/extensions/smb
12 files changed
+398
-19
lines changedLines changed: 13 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| |||
218 | 219 | | |
219 | 220 | | |
220 | 221 | | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
221 | 225 | | |
222 | 226 | | |
223 | 227 | | |
| |||
236 | 240 | | |
237 | 241 | | |
238 | 242 | | |
| 243 | + | |
| 244 | + | |
239 | 245 | | |
240 | 246 | | |
241 | 247 | | |
| |||
264 | 270 | | |
265 | 271 | | |
266 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
267 | 278 | | |
268 | 279 | | |
269 | 280 | | |
| |||
278 | 289 | | |
279 | 290 | | |
280 | 291 | | |
281 | | - | |
| 292 | + | |
| 293 | + | |
282 | 294 | | |
283 | 295 | | |
284 | 296 | | |
| |||
Lines changed: 21 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
121 | 125 | | |
122 | 126 | | |
123 | 127 | | |
| |||
134 | 138 | | |
135 | 139 | | |
136 | 140 | | |
| 141 | + | |
| 142 | + | |
137 | 143 | | |
138 | 144 | | |
139 | 145 | | |
| |||
156 | 162 | | |
157 | 163 | | |
158 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
159 | 178 | | |
160 | 179 | | |
161 | 180 | | |
| |||
168 | 187 | | |
169 | 188 | | |
170 | 189 | | |
171 | | - | |
| 190 | + | |
| 191 | + | |
172 | 192 | | |
173 | 193 | | |
174 | 194 | | |
| |||
Lines changed: 15 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
| 100 | + | |
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
108 | 123 | | |
109 | 124 | | |
110 | 125 | | |
| |||
Lines changed: 21 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
222 | 223 | | |
223 | 224 | | |
224 | 225 | | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
225 | 229 | | |
226 | 230 | | |
227 | 231 | | |
| |||
246 | 250 | | |
247 | 251 | | |
248 | 252 | | |
| 253 | + | |
| 254 | + | |
249 | 255 | | |
250 | 256 | | |
251 | 257 | | |
| |||
283 | 289 | | |
284 | 290 | | |
285 | 291 | | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
286 | 305 | | |
287 | 306 | | |
288 | 307 | | |
| |||
291 | 310 | | |
292 | 311 | | |
293 | 312 | | |
294 | | - | |
| 313 | + | |
| 314 | + | |
295 | 315 | | |
296 | 316 | | |
297 | 317 | | |
| |||
0 commit comments