-
Notifications
You must be signed in to change notification settings - Fork 1.8k
in_tail: Implement long line truncation #11059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a configurable Changes
Sequence Diagram(s)sequenceDiagram
participant Reader as File Reader
participant Processor as Line Processor (in_tail)
participant Truncator as UTF-8 Truncator
participant Metrics as Metrics Registry
participant Output as Output Queue
Note right of Reader: read chunk from file
Reader->>Processor: supply chunk bytes
Processor->>Processor: decode/convert chunk (ret = decoded length)
Processor->>Processor: search newline within eff_max window
alt newline found or within limits
Processor->>Output: emit complete line
else truncate_long_lines enabled and dec_len >= eff_max
Processor->>Truncator: utf8_safe_truncate_pos(buf, ret, eff_max)
Truncator->>Processor: return cut position
Processor->>Output: emit truncated segment
Processor->>Metrics: increment long_line_truncated counter
Processor->>Processor: set skip_next, adjust bytes/offsets
else truncate_long_lines disabled
Processor->>Processor: skip/drop until newline (existing behavior)
end
Processor->>Reader: return processed byte count / update state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings📚 Learning: 2025-10-23T07:43:16.197ZApplied to files:
🧬 Code graph analysis (1)plugins/in_tail/tail_file.c (5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
For avoiding to skip long line consumption, it sometimes needs to consume until the limit of buffers. This could provide different approach of mitigation for consuming long lines. Signed-off-by: Hiroshi Hatake <[email protected]>
dbb5d14 to
3119662
Compare
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
… for generic conversions Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/in_tail/tail_config.c (2)
491-503: Metric help string typo.“occurences” → “occurrences”. Keeps consistency with other counters.
- "Total number of truncated occurences for long lines", + "Total number of truncated occurrences for long lines",
138-149: Initialize pipe channel arrays to -1 immediately after calloc to prevent closing stdin on early failure.The risk is real: if
flb_pipe_create(ctx->ch_pending)fails at line 136, thench_pending[0]andch_pending[1]remain zero-initialized. The subsequentflb_tail_config_destroy()call at line 139 unconditionally invokesflb_pipe_close()on all four channels. The Linux guard checksif (fd == -1)but does not protect against fd=0; the Windows version lacks any guard. Either way,close(0)executes, closing stdin.Apply the suggested initialization pattern immediately after
flb_calloc():ctx = flb_calloc(1, sizeof(struct flb_tail_config)); if (!ctx) { flb_errno(); return NULL; } +/* Initialize pipe fds to -1 to safely guard early destroy() calls */ +ctx->ch_manager[0] = ctx->ch_manager[1] = -1; +ctx->ch_pending[0] = ctx->ch_pending[1] = -1; ctx->config = config;
🧹 Nitpick comments (5)
plugins/in_tail/tail.c (1)
724-727: Config option addition LGTM; clarify precedence with skip_long_lines.When both truncate_long_lines=on and skip_long_lines=on, truncation currently wins. Consider documenting that or warning if both enabled.
Confirm intended precedence in docs.
tests/runtime/in_tail.c (1)
994-1024: Harden writes against partial writes.write(2) on files can return short counts; loop until full to avoid flaky tests.
static int write_long_ascii_line(int fd, size_t total_bytes) { - const char *chunk = "0123456789abcdef0123456789abcdef"; /* 32 bytes */ + const char *chunk = "0123456789abcdef0123456789abcdef"; /* 32 bytes */ size_t chunk_len = strlen(chunk); size_t written = 0; - ssize_t ret; + ssize_t ret; size_t rest = 0; while (written + chunk_len <= total_bytes) { - ret = write(fd, chunk, chunk_len); - if (ret < 0) { + size_t off = 0; + while (off < chunk_len) { + ret = write(fd, chunk + off, chunk_len - off); + if (ret <= 0) { flb_errno(); return -1; } + off += (size_t) ret; + } - } - written += (size_t) ret; + written += chunk_len; } if (written < total_bytes) { rest = total_bytes - written; - ret = write(fd, chunk, rest); - if (ret < 0) { + size_t off = 0; + while (off < rest) { + ret = write(fd, chunk + off, rest - off); + if (ret <= 0) { flb_errno(); return -1; } + off += (size_t) ret; + } - } - written += (size_t) ret; + written += rest; } @@ static int write_long_utf8_line(int fd, size_t total_bytes) { const char *u8_aa = "あ"; size_t u8_len = strlen(u8_aa); /* 3 */ size_t written = 0; - ssize_t ret; + ssize_t ret; const char *ascii = "XYZ"; size_t rest = 0; while (written + u8_len <= total_bytes) { - ret = write(fd, u8_aa, u8_len); - if (ret < 0) { + size_t off = 0; + while (off < u8_len) { + ret = write(fd, u8_aa + off, u8_len - off); + if (ret <= 0) { flb_errno(); return -1; } + off += (size_t) ret; + } - } - written += (size_t) ret; + written += u8_len; } if (written < total_bytes) { rest = total_bytes - written; - if (rest > strlen(ascii)) { + if (rest > strlen(ascii)) { rest = strlen(ascii); } - ret = write(fd, ascii, rest); - if (ret < 0) { + size_t off = 0; + while (off < rest) { + ret = write(fd, ascii + off, rest - off); + if (ret <= 0) { flb_errno(); return -1; } + off += (size_t) ret; + } - } - written += (size_t) ret; + written += rest; }Also applies to: 1026-1061
plugins/in_tail/tail_file.c (3)
460-479: UTF‑8 cut helper looks correct. Add a brief comment on invariants.Optional: note that caller guarantees max < len before call.
593-605: Window computation is inconsistent (computed twice, second overrides first).You first derive window from eff_max, then overwrite with buf_max_size+1. Keep one source of truth (eff_max) for clarity and to avoid future regressions.
- /* Set the search window for memchr. Add 1 because memchr is (ptr, char, size) */ - if (eff_max > 0) { - window = eff_max + 1; - } - else { - window = 0; - } - dec_len = (size_t)(end - data); - window = ctx->buf_max_size + 1; + /* Set the search window for memchr to the effective threshold (+1) */ + dec_len = (size_t)(end - data); + window = (eff_max > 0) ? (eff_max + 1) : 0; if (window > dec_len) { window = dec_len; }
607-642: Truncation flow LGTM; small robustness nits.
- Consider checking the return of flb_tail_file_pack_line and logging on failure.
- When skip_next is already true and no newline found, you consume the whole decoded window via bytes_override; that’s intentional, but add a short comment to make it explicit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
plugins/in_tail/tail.c(1 hunks)plugins/in_tail/tail_config.c(7 hunks)plugins/in_tail/tail_config.h(3 hunks)plugins/in_tail/tail_file.c(8 hunks)tests/runtime/in_tail.c(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
plugins/in_tail/tail_file.c (5)
plugins/in_tail/tail_multiline.c (1)
flb_tail_mult_flush(486-534)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_inc(100-117)src/flb_input.c (1)
flb_input_name(790-797)src/flb_metrics.c (1)
flb_metrics_sum(180-191)plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
plugins/in_tail/tail_config.c (2)
lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_create(26-81)src/flb_metrics.c (1)
flb_metrics_add(129-178)
tests/runtime/in_tail.c (1)
src/flb_lib.c (9)
flb_create(138-220)flb_service_set(647-673)flb_input(261-271)flb_input_set(300-330)flb_output(274-284)flb_output_set(515-546)flb_start(914-925)flb_stop(942-985)flb_destroy(223-258)
🔇 Additional comments (4)
plugins/in_tail/tail_config.c (2)
513-515: Old metrics label addition LGTM.
238-246: Good exclusivity check for encodings.Please confirm this block is compiled on all targets where both features can be enabled (build matrix).
plugins/in_tail/tail_file.c (2)
622-629: Metrics increment: OK; keep help/label spelling consistent.Ensure help string spelling fixed in tail_config.c as noted.
1613-1657: Nice: early truncation path when buffer is full.This prevents unbounded growth and keeps forward progress.
Signed-off-by: Hiroshi Hatake <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
460-478: UTF-8 boundary validation may be incomplete.The function backtracks over continuation bytes (0x80-0xBF) but doesn't verify that the resulting
cutposition points to a valid UTF-8 start byte. Ifmaxlands in the middle of a multi-byte sequence, backtracking ensures we don't split continuation bytes, but we might still cut at an invalid position if the data contains malformed UTF-8.Consider adding validation that
s[cut](ifcut < len) is a valid start byte (0x00-0x7F, 0xC0-0xDF, 0xE0-0xEF, or 0xF0-0xF7).while (cut > 0 && ((unsigned char)s[cut - 1] & 0xC0) == 0x80) { cut--; } + + /* Optionally verify we're at a valid UTF-8 start byte */ + if (cut > 0 && cut < len) { + unsigned char c = (unsigned char)s[cut]; + /* Valid start bytes: 0xxxxxxx, 110xxxxx, 1110xxxx, 11110xxx */ + if ((c & 0x80) != 0 && (c & 0xC0) != 0xC0) { + /* Invalid start byte after backtrack, search backward for valid start */ + while (cut > 0) { + cut--; + c = (unsigned char)s[cut]; + if ((c & 0x80) == 0 || (c & 0xC0) == 0xC0) { + break; + } + } + } + } return cut;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (5)
plugins/in_tail/tail_multiline.c (1)
flb_tail_mult_flush(486-534)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_inc(100-117)src/flb_input.c (1)
flb_input_name(790-797)src/flb_metrics.c (1)
flb_metrics_sum(180-191)plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
🔇 Additional comments (6)
plugins/in_tail/tail_file.c (6)
504-511: LGTM!The new local variables for truncation state are well-named and properly initialized.
561-562: Good optimization: using precise decoded length.Using the return value from the conversion function is more efficient and accurate than
strlen(decoded), which would require scanning the entire buffer.
802-806: LGTM!The
truncation_endlabel is correctly placed to ensure cleanup of the decoded buffer while bypassing normal line processing when truncation occurs.
812-820: LGTM!The three-way byte accounting correctly handles truncation, encoding conversion, and normal processing cases.
1613-1619: LGTM!Variable declarations are properly initialized.
616-618: Multiline flush behavior during truncation is correct; state is properly reset.The multiline state machine is properly reset after the flush.
flb_tail_mult_pack_line_body()(called fromflb_tail_mult_flush()) resets all multiline state variables at its conclusion (lines 471-474):mult_firstline,mult_skipping,mult_keys, andmult_flush_timeout. This ensures the multiline state is clean before the truncated line is packed, preventing confusion in subsequent iterations.
Signed-off-by: Hiroshi Hatake <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
plugins/in_tail/tail_file.c (2)
573-624: Simplify truncation trigger; remove redundant buf_size gate and bound search windowThe condition already requires dec_len >= eff_max (buf_max_size−1). When buf_size < buf_max_size, dec_len cannot reach eff_max, so the extra check file->buf_size >= ctx->buf_max_size is redundant. Also, set the memchr window to min(eff_max+1, dec_len) to avoid scanning beyond the effective limit.
- dec_len = (size_t)(end - data); - window = ctx->buf_max_size + 1; - if (window > dec_len) { - window = dec_len; - } + dec_len = (size_t)(end - data); + /* Search at most up to the effective threshold (+1 for '\n') */ + window = eff_max + 1; + if (window > dec_len) { + window = dec_len; + } nl = memchr(data, '\n', window); - if (file->buf_size >= ctx->buf_max_size && - nl == NULL && eff_max > 0 && dec_len >= eff_max) { + if (nl == NULL && eff_max > 0 && dec_len >= eff_max) {This makes the logic consistent and avoids unnecessary coupling to current allocation size. Related to earlier feedback on the threshold logic.
1618-1640: Gate early processing by maxed buffer to avoid no‑op passesEarly call to process_content happens whenever capacity < 1, but truncation only triggers once we hit the effective threshold. Guard the early path with file->buf_size >= ctx->buf_max_size so we don't spin a no‑op that just falls through to growth.
- if (ctx->truncate_long_lines == FLB_TRUE) { + if (ctx->truncate_long_lines == FLB_TRUE && + file->buf_size >= ctx->buf_max_size) { lines = process_content(file, &processed_bytes); ... }Keeps intent clear and aligns with the truncation condition.
🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)
793-799: Bytes accounting after truncation looks correct; minor clarity tweak optionalThe bytes_override path is used only when truncation_happened; when not, original_len (if set) wins, else processed_bytes. Consider a small comment noting why original_len takes precedence (pre‑decode byte accounting). No functional change.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (5)
plugins/in_tail/tail_multiline.c (1)
flb_tail_mult_flush(486-534)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_inc(100-117)src/flb_input.c (1)
flb_input_name(790-797)src/flb_metrics.c (1)
flb_metrics_sum(180-191)plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)
560-562: Good: use precise decoded length from generic converterAssigning end = data + ret avoids reliance on stale end. LGTM.
Signed-off-by: Hiroshi Hatake <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
plugins/in_tail/tail_file.c (1)
460-478: Fix UTF‑8 cut: current backtrack can return a lone lead byteBacktrack must inspect the first omitted byte (s[cut]), not the last included (s[cut-1]); otherwise max=1 on a multibyte char yields invalid UTF‑8. Apply:
static size_t utf8_safe_truncate_pos(const char *s, size_t len, size_t max) { - size_t cut = 0; - - cut = (len <= max) ? len : max; + size_t cut = (len <= max) ? len : max; if (cut == len) { return cut; } - /* backtrack over continuation bytes 10xxxxxx - * NOTE: check the last INCLUDED byte => s[cut-1], not s[cut]. - */ - while (cut > 0 && ((unsigned char)s[cut - 1] & 0xC0) == 0x80) { + /* If first omitted byte is a continuation (10xxxxxx), step back */ + while (cut > 0 && cut < len && + ((unsigned char)s[cut] & 0xC0) == 0x80) { cut--; } return cut; }Optionally add tests that truncate at max=1 for 2/3/4‑byte codepoints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/in_tail/tail_file.c(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_tail/tail_file.c (5)
plugins/in_tail/tail_multiline.c (1)
flb_tail_mult_flush(486-534)lib/cmetrics/src/cmt_counter.c (1)
cmt_counter_inc(100-117)src/flb_input.c (1)
flb_input_name(790-797)src/flb_metrics.c (1)
flb_metrics_sum(180-191)plugins/in_tail/tail_db.c (1)
flb_tail_db_file_offset(290-321)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
plugins/in_tail/tail_file.c (1)
560-562: Good: use precise decoded lengthUsing ret to set end after generic encoding conversion is correct and avoids stale length math.
…ng accent symbols Signed-off-by: Hiroshi Hatake <[email protected]>
For avoiding to skip long line consumption,
it sometimes needs to consume until the limit of buffers. This could provide different approach of mitigation for consuming long lines.
Fixes #10435.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests