Skip to content

Conversation

@olegmukhin
Copy link

@olegmukhin olegmukhin commented Jul 19, 2025

Added a new LookUp filter to address use case when enrichment of record is required based on simple static key value lookup.

The filter loads a CSV file into a hash table for performance. It consider first column of the CSV to be the key and the second column to be the value. All other columns are ignored.

Where a record value (identified by lookup_key input) matches the key from the CSV, the value from the CSV row is added under a new key (defined by result_key input) to the record.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Added a "lookup" filter to enrich records from a CSV key/value map; configurable source/result fields, optional case-insensitive matching, robust CSV parsing, preserves originals on no-match, and emits processed/matched/skipped metrics.
  • Tests

    • Added comprehensive runtime tests (quoting, large datasets, nested/array keys, trimming, numeric/boolean values, metrics). Tests run only when record accessor support is enabled.
  • Chores

    • Added build option to enable the lookup filter (default ON) and conditional build/registration tied to record accessor support.

✏️ Tip: You can customize this high-level summary in your review settings.

@olegmukhin
Copy link
Author

Test configuration

Fluent Bit YAML Configuration

parsers:
  - name: json
    format: json

pipeline:
  inputs:
    - name: tail
      path: /src/devices.log
      read_from_head: true
      parser: json

  filters:
    - name: lookup
      match: "*"
      file: /src/device-bu.csv
      lookup_key: $hostname
      result_key: business_line
      ignore_case: true

  outputs:
    - name: stdout
      match: "*"

To test new filter we will load a range of log values including, strings (different cases), integer, boolean, embedded quotes and other value types.

devices.log

{"hostname": "server-prod-001"}
{"hostname": "Server-Prod-001"}
{"hostname": "db-test-abc"}
{"hostname": 123}
{"hostname": true}
{"hostname": " host with space "}
{"hostname": "quoted \"host\""}
{"hostname": "unknown-host"}
{}
{"hostname": [1,2,3]}
{"hostname": {"sub": "val"}}
{"hostname": " "}

CSV configuration will aim to test key overwrites, different types of strings, use and escaping of quotes.

device-bu.csv

hostname,business_line
server-prod-001,Finance
db-test-abc,Engineering
db-test-abc,Marketing
web-frontend-xyz,Marketing
app-backend-123,Operations
"legacy-system true","Legacy IT"
" host with space ","Infrastructure"
"quoted ""host""", "R&D"
123, "R&D"
true, "R&D"
no-match-host,Should Not Appear

When executed with verbose flag the following out is produced.

Test output

[2025/07/19 14:38:48] [ info] Configuration:
[2025/07/19 14:38:48] [ info]  flush time     | 1.000000 seconds
[2025/07/19 14:38:48] [ info]  grace          | 5 seconds
[2025/07/19 14:38:48] [ info]  daemon         | 0
[2025/07/19 14:38:48] [ info] ___________
[2025/07/19 14:38:48] [ info]  inputs:
[2025/07/19 14:38:48] [ info]      tail
[2025/07/19 14:38:48] [ info] ___________
[2025/07/19 14:38:48] [ info]  filters:
[2025/07/19 14:38:48] [ info]      lookup.0
[2025/07/19 14:38:48] [ info] ___________
[2025/07/19 14:38:48] [ info]  outputs:
[2025/07/19 14:38:48] [ info]      stdout.0
[2025/07/19 14:38:48] [ info] ___________
[2025/07/19 14:38:48] [ info]  collectors:
[2025/07/19 14:38:48] [ info] [fluent bit] version=4.1.0, commit=, pid=50224
[2025/07/19 14:38:48] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2025/07/19 14:38:48] [ info] [storage] ver=1.5.3, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/07/19 14:38:48] [ info] [simd    ] disabled
[2025/07/19 14:38:48] [ info] [cmetrics] version=1.0.4
[2025/07/19 14:38:48] [ info] [ctraces ] version=0.6.6
[2025/07/19 14:38:48] [ info] [input:tail:tail.0] initializing
[2025/07/19 14:38:48] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2025/07/19 14:38:48] [debug] [tail:tail.0] created event channels: read=25 write=26
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] inotify watch fd=31
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] scanning path /src/*.log
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] file will be read in POSIX_FADV_DONTNEED mode /src/devices.log
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] inode=10 with offset=0 appended as /src/devices.log
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] scan_glob add(): /src/devices.log, inode 10
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] 1 new files found on path '/src/*.log'
[2025/07/19 14:38:48] [ info] [filter:lookup:lookup.0] Loaded 10 entries from CSV
[2025/07/19 14:38:48] [debug] [stdout:stdout.0] created event channels: read=33 write=34
[2025/07/19 14:38:48] [ info] [output:stdout:stdout.0] worker #0 started
[2025/07/19 14:38:48] [ info] [sp] stream processor started
[2025/07/19 14:38:48] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2
[2025/07/19 14:38:48] [debug] [filter:lookup:lookup.0] Record 4: lookup value for key '$hostname' is non-string, converted to '123'
[2025/07/19 14:38:48] [debug] [filter:lookup:lookup.0] Record 5: lookup value for key '$hostname' is non-string, converted to 'true'
[2025/07/19 14:38:48] [debug] [filter:lookup:lookup.0] Record 10: lookup_key '$hostname' has type array/map, skipping to avoid ra error
[2025/07/19 14:38:48] [debug] [filter:lookup:lookup.0] Record 11: lookup_key '$hostname' has type array/map, skipping to avoid ra error
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] [static files] processed 278b
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] inode=10 file=/src/devices.log promote to TAIL_EVENT
[2025/07/19 14:38:48] [ info] [input:tail:tail.0] inotify_fs_add(): inode=10 watch_fd=1 name=/src/devices.log
[2025/07/19 14:38:48] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2025/07/19 14:38:49] [debug] [task] created task=0xffff9c043a10 id=0 OK
[2025/07/19 14:38:49] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] tail.0: [[1752935928.516352587, {}], {"hostname"=>"server-prod-001", "business_line"=>"Finance"}]
[1] tail.0: [[1752935928.516443337, {}], {"hostname"=>"Server-Prod-001", "business_line"=>"Finance"}]
[2] tail.0: [[1752935928.516445712, {}], {"hostname"=>"db-test-abc", "business_line"=>"Marketing"}]
[3] tail.0: [[1752935928.516448504, {}], {"hostname"=>123, "business_line"=>"R&D"}]
[4] tail.0: [[1752935928.516450337, {}], {"hostname"=>true, "business_line"=>"R&D"}]
[5] tail.0: [[1752935928.516452004, {}], {"hostname"=>" host with space ", "business_line"=>"Infrastructure"}]
[6] tail.0: [[1752935928.516453670, {}], {"hostname"=>"quoted "host"", "business_line"=>"R&D"}]
[7] tail.0: [[1752935928.516455212, {}], {"hostname"=>"unknown-host"}]
[8] tail.0: [[1752935928.516456504, {}], {}]
[9] tail.0: [[1752935928.516458712, {}], {"hostname"=>[1, 2, 3]}]
[10] tail.0: [[1752935928.516460754, {}], {"hostname"=>{"sub"=>"val"}}]
[2025/07/19 14:38:49] [debug] [out flush] cb_destroy coro_id=0
[2025/07/19 14:38:49] [debug] [task] destroy task=0xffff9c043a10 (task_id=0)

Output shows correct matching and handling of different value types and correct output when no match is detected.

Valgrind summary (after run with multiple types of lookups):

==50220== HEAP SUMMARY:
==50220==     in use at exit: 0 bytes in 0 blocks
==50220==   total heap usage: 14,547 allocs, 14,550 frees, 74,987,419 bytes allocated
==50220== 
==50220== All heap blocks were freed -- no leaks are possible
==50220== 
==50220== Use --track-origins=yes to see where uninitialised values come from
==50220== For lists of detected and suppressed errors, rerun with: -s
==50220== ERROR SUMMARY: 6 errors from 4 contexts (suppressed: 0 from 0)

@olegmukhin
Copy link
Author

Documentation for this filter has been submitted as part of #fluent/fluent-bit-docs/pull/1953.

@olegmukhin
Copy link
Author

Added unit tests for lookup filter. All tests pass:

Test basic_lookup...                            [ OK ]
Test ignore_case...                             [ OK ]
Test csv_quotes...                              [ OK ]
Test numeric_values...                          [ OK ]
Test large_numbers...                           [ OK ]
Test boolean_values...                          [ OK ]
Test no_match...                                [ OK ]
Test long_csv_lines...                          [ OK ]
Test whitespace_trim...                         [ OK ]
Test dynamic_buffer...                          [ OK ]
Test nested_keys...                             [ OK ]
Test large_csv...                               [ OK ]
Test nested_array_keys...                       [ OK ]
Test metrics_matched...                         [ OK ]
Test metrics_processed...                       [ OK ]

Valgrind results are showing appropriate memory management.

==19111== HEAP SUMMARY:
==19111==     in use at exit: 0 bytes in 0 blocks
==19111==   total heap usage: 6,964 allocs, 6,964 frees, 59,096,180 bytes allocated
==19111== 
==19111== All heap blocks were freed -- no leaks are possible
==19111== 
==19111== Use --track-origins=yes to see where uninitialised values come from

@olegmukhin
Copy link
Author

Added fix for failing checks on Cent OS 7 and Windows. Please rerun.

@olegmukhin
Copy link
Author

olegmukhin commented Jul 21, 2025

Last check is failing due to Cent OS 7 incompatibility in unit test file - fix in last commit. Please rerun.

@olegmukhin
Copy link
Author

Could this please get one more run at the checks? I didn't realise we need this to compile on Cent OS 7 - should be good now with last commit.

@patrick-stephens
Copy link
Collaborator

Can you rebase and push so it reruns tests?

@olegmukhin
Copy link
Author

@patrick-stephens addressed new AI review comment and rebased🤞. Let me know if there is anything else outstanding here. Thanks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
plugins/filter_lookup/lookup.c (2)

348-550: Consider removing redundant goto next_line statements.

The next_line label at line 548 simply executes continue, making the goto statements at lines 348, 364, and 380 unnecessary. You can replace them with direct continue statements for clearer code flow.

Apply this diff to simplify:

                         if (dynbuf_append_char(&key_buf, '"') != 0) {
                             flb_plg_debug(ctx->ins, "Buffer allocation failed for line %d", line_num);
                             dynbuf_destroy(&key_buf);
                             dynbuf_destroy(&val_buf);
                             flb_free(line);
                             line_num++;
-                            goto next_line;
+                            continue;
                         }

Apply similar changes at lines 364 and 380, and remove the next_line: label entirely.


1008-1008: Use precomputed result_key_len instead of strlen.

Line 628 precomputes ctx->result_key_len specifically for hot-path optimization, but this line calls strlen(ctx->result_key) instead. Use the precomputed value for consistency and to avoid redundant string traversal.

Apply this diff:

-        ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, strlen(ctx->result_key));
+        ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, ctx->result_key_len);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16bd06f and 8633b7a.

📒 Files selected for processing (3)
  • plugins/filter_lookup/lookup.c (1 hunks)
  • plugins/filter_lookup/lookup.h (1 hunks)
  • tests/runtime/filter_lookup.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/runtime/filter_lookup.c
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (6)
tests/runtime/filter_lookup.c (3)
  • dynbuf_init (691-698)
  • dynbuf_append_char (700-711)
  • dynbuf_destroy (713-720)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
src/flb_hash_table.c (4)
  • flb_hash_table_add (401-494)
  • flb_hash_table_create (99-137)
  • flb_hash_table_destroy (197-215)
  • flb_hash_table_get (496-522)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_destroy (232-248)
  • flb_ra_get_value_object (803-814)
src/flb_log_event_encoder.c (7)
  • flb_log_event_encoder_begin_record (246-254)
  • flb_log_event_encoder_set_timestamp (276-287)
  • flb_log_event_encoder_commit_record (256-274)
  • flb_log_event_encoder_rollback_record (241-244)
  • flb_log_event_encoder_init (42-74)
  • flb_log_event_encoder_claim_internal_buffer_ownership (118-124)
  • flb_log_event_encoder_destroy (99-116)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
🔇 Additional comments (2)
plugins/filter_lookup/lookup.h (1)

1-56: LGTM! Well-structured plugin header.

The header properly defines the plugin's public API with appropriate include guards, conditional metrics compilation, and clear field documentation. The precomputed result_key_len field (line 40) is a good optimization for the hot path.

plugins/filter_lookup/lookup.c (1)

1056-1094: LGTM! Proper resource cleanup and plugin registration.

The exit callback correctly frees all resources in the proper order: val_list nodes, record accessor, hash table, and context. The plugin registration with config_map is well-structured with appropriate field types and descriptions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
plugins/filter_lookup/lookup.c (1)

348-348: Optional: goto next_line is redundant—replace with continue.

Multiple error paths use goto next_line (lines 348, 364, 380, 413, 429, 444), but the next_line label at line 548 immediately executes continue. Since cleanup is already complete before the goto, you can replace all goto next_line statements with continue directly, eliminating the unnecessary label.

Also applies to: 364-364, 380-380, 413-413, 429-429, 444-444

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8633b7a and d303c42.

📒 Files selected for processing (1)
  • plugins/filter_lookup/lookup.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (6)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
src/flb_hash_table.c (4)
  • flb_hash_table_add (401-494)
  • flb_hash_table_create (99-137)
  • flb_hash_table_destroy (197-215)
  • flb_hash_table_get (496-522)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_destroy (232-248)
  • flb_ra_get_value_object (803-814)
src/flb_log_event_encoder.c (7)
  • flb_log_event_encoder_begin_record (246-254)
  • flb_log_event_encoder_set_timestamp (276-287)
  • flb_log_event_encoder_commit_record (256-274)
  • flb_log_event_encoder_rollback_record (241-244)
  • flb_log_event_encoder_init (42-74)
  • flb_log_event_encoder_claim_internal_buffer_ownership (118-124)
  • flb_log_event_encoder_destroy (99-116)
src/flb_log_event_decoder.c (3)
  • flb_log_event_decoder_init (103-120)
  • flb_log_event_decoder_destroy (151-178)
  • flb_log_event_decoder_next (309-456)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
🔇 Additional comments (2)
plugins/filter_lookup/lookup.c (2)

987-1011: LGTM: Error recovery correctly breaks out of for-loop.

The goto next_record pattern at line 1002 is correct. When append_body_values fails during key copying, the code:

  1. Rolls back the incomplete record
  2. Emits the original record
  3. Jumps to next_record label (line 1007)
  4. Checks the error condition (line 1009) and continues the outer while-loop

This correctly skips the rest of the record construction (appending result_key and committing) and moves to the next log event. The past review comment suggesting this was incorrect was mistaken—the current implementation properly handles the error case.


1-1100: Well-implemented CSV lookup filter with robust error handling.

This is a comprehensive and well-structured implementation of a CSV-based lookup filter. The code demonstrates:

Solid design decisions:

  • Proper separation of concerns with helper functions
  • Robust CSV parsing with quote handling and dynamic buffers
  • Comprehensive error handling and cleanup paths
  • Efficient hash table usage for lookups
  • Support for non-string type conversion with explicit array/map skipping

Memory safety:

  • Past review issues (unmatched quote detection, val_node allocation order) have been properly addressed
  • Consistent cleanup patterns throughout
  • Proper ownership tracking via val_list

Good practices:

  • Platform-specific code properly guarded
  • Metrics integration with proper guards
  • Configurable behavior via config_map
  • Comprehensive logging at appropriate levels

The implementation follows Fluent Bit patterns correctly and is production-ready.

@olegmukhin
Copy link
Author

@patrick-stephens I think maybe the runner ran out of disk space for 1 of the jobs? I don't think this is related to this PR, but let me know if you need anything :)

@cosmo0920
Copy link
Contributor

We fixed no left device error in the current master.
So, could you rebase off master?

New filter aims to address use case of simple data enrichment using
static key value lookup.

The filter loads first two columns of CSV file into memory as a hash
table. When a specified record value matches the key in the hash table
the value will be appended to the record (based on key name defined
in the filter inputs).)

Tested with valgrind.

Signed-off-by: Oleg Mukhin <[email protected]>
- Removed unecessary FLB_FILTER_LOOKUP build flag now LookUp
is enabled by default like other filters (without flag).
- Fixed critical use-after-free bug in numeric value lookups.
- Added processed_records_total, matched_records_total and
skipped_records_total metrics to enable operational visibility
- Added unit tests to cover handling of different data types,
CSV loading/handling and metrics tests.

Tested with valgrind - no memory leaks. All unit tests pass.

Signed-off-by: Oleg Mukhin <[email protected]>
- fix variable declarations and remove C99 features
- Conditional compilation for Windows vs Unix headers/functions
- Replace bool with int, fix format specifiers, update comments

All 15 unit tests for filter passed.

Signed-off-by: Oleg Mukhin <[email protected]>
- fix variable declarations and remove C99 features for unit tests
- Conditional compilation for Windows for unit test features

All 15 unit tests for filter passed.

Signed-off-by: Oleg Mukhin <[email protected]>
Addressed following issues:
Fix potential memory leak when val_node allocation fails
Wrap test metrics code with FLB_HAVE_METRICS guards
Replace metric macros with enum to prevent namespace pollution
Gate plugin registration on FLB_RECORD_ACCESSOR option
Add unmatched quote detection after key parsing in CSV loader
Replace magic numbers with semantic msgpack type checking
Fix thread safety in lookup filter metrics macros
Eliminated potential segfaults from null pointer dereferences
Added defensive checks to the metric creation code
Optimise hot path by eliminating repeated strlen calls

Signed-off-by: Oleg Mukhin <[email protected]>
- Added skip_header_row config option (defaults to false for headerless CSVs)
- Renamed 'file' parameter to 'data_source' for future URL/database support
- Added skip_header_row test to verify both true/false behavior

Signed-off-by: Oleg Mukhin <[email protected]>
Fix bug where continue inside for-loop breaks error recovery.
The fix uses goto to jump past the key copying code and explicitly
checks encoder state before proceeding.

Signed-off-by: Oleg Mukhin <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
plugins/filter_lookup/lookup.c (1)

1014-1014: Minor optimization: use precomputed result_key_len.

Line 1014 calls strlen(ctx->result_key) in the hot path, but result_key_len is already precomputed during initialization (line 628) for exactly this purpose.

Apply this diff:

-        ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, strlen(ctx->result_key));
+        ret = flb_log_event_encoder_append_body_string(&log_encoder, ctx->result_key, ctx->result_key_len);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d303c42 and 4fdd7d1.

📒 Files selected for processing (7)
  • cmake/plugins_options.cmake (1 hunks)
  • plugins/CMakeLists.txt (1 hunks)
  • plugins/filter_lookup/CMakeLists.txt (1 hunks)
  • plugins/filter_lookup/lookup.c (1 hunks)
  • plugins/filter_lookup/lookup.h (1 hunks)
  • tests/runtime/CMakeLists.txt (1 hunks)
  • tests/runtime/filter_lookup.c (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/runtime/filter_lookup.c
🚧 Files skipped from review as they are similar to previous changes (4)
  • cmake/plugins_options.cmake
  • plugins/filter_lookup/CMakeLists.txt
  • plugins/CMakeLists.txt
  • tests/runtime/CMakeLists.txt
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/filter_lookup/lookup.c (9)
tests/runtime/filter_lookup.c (3)
  • dynbuf_init (691-698)
  • dynbuf_append_char (700-711)
  • dynbuf_destroy (713-720)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
src/flb_hash_table.c (4)
  • flb_hash_table_add (401-494)
  • flb_hash_table_create (99-137)
  • flb_hash_table_destroy (197-215)
  • flb_hash_table_get (496-522)
lib/cmetrics/src/cmt_counter.c (1)
  • cmt_counter_create (26-81)
src/flb_metrics.c (1)
  • flb_metrics_add (129-178)
src/flb_record_accessor.c (3)
  • flb_ra_create (271-358)
  • flb_ra_destroy (232-248)
  • flb_ra_get_value_object (803-814)
src/flb_log_event_encoder.c (6)
  • flb_log_event_encoder_begin_record (246-254)
  • flb_log_event_encoder_commit_record (256-274)
  • flb_log_event_encoder_rollback_record (241-244)
  • flb_log_event_encoder_init (42-74)
  • flb_log_event_encoder_claim_internal_buffer_ownership (118-124)
  • flb_log_event_encoder_destroy (99-116)
src/flb_log_event_decoder.c (3)
  • flb_log_event_decoder_init (103-120)
  • flb_log_event_decoder_destroy (151-178)
  • flb_log_event_decoder_next (309-456)
src/flb_ra_key.c (1)
  • flb_ra_key_value_destroy (842-851)
🔇 Additional comments (8)
plugins/filter_lookup/lookup.h (1)

1-56: LGTM! Clean and well-structured header.

The header file is well-organized with:

  • Proper header guards
  • Clear separation of concerns (metrics enum, context struct, plugin declaration)
  • Good inline documentation for the hot-path optimization (result_key_len)
  • Appropriate conditional compilation for metrics
plugins/filter_lookup/lookup.c (7)

45-77: LGTM! Well-structured metric macros.

The metric increment macros properly handle both the CMT (CMetrics) system and the legacy metrics system, with appropriate guards for when FLB_HAVE_METRICS is not defined.


88-260: LGTM! Robust helper functions with proper error handling.

The helper functions are well-implemented:

  • normalize_and_trim correctly handles whitespace and case normalization with clear return codes
  • Dynamic buffer operations use standard 2× growth strategy
  • read_line_dynamic properly handles arbitrary line lengths, EOF conditions, and line ending normalization

262-554: LGTM! CSV parser with comprehensive error handling.

The CSV loading function correctly handles:

  • Optional header row skipping based on configuration
  • Quote escaping (doubled quotes within quoted fields)
  • Unmatched quote detection for both keys and values
  • Proper memory management with val_node tracking before hash insertion
  • Dynamic buffer cleanup on all error paths

All previously flagged issues have been properly addressed.


556-679: LGTM! Comprehensive initialization with proper validation.

The initialization callback is well-structured with:

  • Both CMT and legacy metrics setup
  • Required parameter validation
  • Cross-platform file accessibility checks
  • Hot-path optimization: result_key_len precomputed for fast comparison
  • Thorough error cleanup ensuring no resource leaks

681-708: LGTM! Clean helper for emitting original records.

The emit_original_record helper properly encapsulates the record emission pattern with appropriate error handling and metric tracking.


710-1005: LGTM! Robust record processing with comprehensive error handling.

The filter callback is well-implemented with:

  • Proper decoder/encoder lifecycle management
  • Thorough type handling for lookup values (primitives converted to strings, arrays/maps skipped)
  • Correct error recovery using goto/label pattern (addressing previous review)
  • Comprehensive error handling with rollback and original record emission
  • Efficient any_modified tracking to avoid unnecessary buffer copies

1062-1100: LGTM! Complete cleanup and plugin registration.

The exit callback and plugin registration are properly implemented:

  • Safe list traversal and cleanup of all tracked values
  • Destruction of record accessor and hash table
  • Comprehensive config map with clear descriptions
  • Standard plugin struct definition

@olegmukhin
Copy link
Author

olegmukhin commented Nov 21, 2025

@cosmo0920 @patrick-stephens thank you both. Rebased. 🤞

I'll pick up the nitpick comment from CodeRabbit in next PR - just want to get this one over the line.

@patrick-stephens
Copy link
Collaborator

Can you link the docs PR?
As it is a new plugin, I will leave with @edsiper for final decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required ok-package-test Run PR packaging tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants