Skip to content

Performance improvement: Optimize DefaultFieldSet.indexOf() method #4930

@cwangg897

Description

@cwangg897

Expected Behavior

The DefaultFieldSet.indexOf() method should perform field name lookups in O(1) time complexity using a precomputed index map, rather than the current O(n) linear search through the names list.

// Expected: O(1) lookup using HashMap
protected int indexOf(String name) {
    if (nameIndexMap == null) {
        throw new IllegalArgumentException("Cannot access columns by name without meta data");
    }
    Integer index = nameIndexMap.get(name);
    if (index != null) {
        return index;
    }
    throw new IllegalArgumentException("Cannot access column [" + name + "] from " + names);
}

Current Behavior

Currently, DefaultFieldSet.indexOf() uses List.indexOf() which performs a linear search through all field names every time a field is accessed by name. This results in O(n) time complexity.

// Current: O(n) linear search
protected int indexOf(String name) {
    if (names == null) {
        throw new IllegalArgumentException("Cannot access columns by name without meta data");
    }
    int index = names.indexOf(name);
    if (index >= 0) {
        return index;
    }
    throw new IllegalArgumentException("Cannot access column [" + name + "] from " + names);
}

The performance difference becomes significant when processing CSV files with many columns (50+ fields) and accessing fields by name frequently during batch processing. For a 100-column CSV, accessing the last column requires 100 iterations every time, which can severely impact performance in high-volume batch jobs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions