Skip to content

Conversation

@iceljc
Copy link
Collaborator

@iceljc iceljc commented Dec 3, 2025

PR Type

Enhancement


Description

  • Add CSV file parsing support to Excel handler plugin

  • Create ExcelHelper class with CSV to workbook conversion logic

  • Support multiple file formats (XLS, XLSX, CSV) with format detection

  • Improve header parsing to filter empty cells in sheet data


Diagram Walkthrough

flowchart LR
  A["CSV File"] --> B["ExcelHelper.ConvertCsvToWorkbook"]
  C["XLS File"] --> D["HSSFWorkbook"]
  E["XLSX File"] --> F["XSSFWorkbook"]
  B --> G["IWorkbook"]
  D --> G
  F --> G
  G --> H["Database Processing"]
Loading

File Walkthrough

Relevant files
Enhancement
ReadExcelFn.cs
Support multiple file formats in workbook conversion         

src/Plugins/BotSharp.Plugin.ExcelHandler/Functions/ReadExcelFn.cs

  • Add support for CSV file type in mime type detection
  • Refactor ConvertToWorkBook method to handle multiple file formats
  • Add format detection logic for XLS and XLSX files
  • Pass file extension to workbook conversion method
+19/-6   
ExcelHelper.cs
New helper class for CSV to workbook conversion                   

src/Plugins/BotSharp.Plugin.ExcelHandler/Helpers/ExcelHelper.cs

  • Create new helper class with CSV parsing functionality
  • Implement ConvertCsvToWorkbook method to convert CSV bytes to NPOI
    workbook
  • Add ParseCsvLine method with proper quote and comma handling
  • Support numeric value detection and conversion for data cells
+82/-0   
Bug fix
MySqlService.cs
Improve header parsing and code formatting                             

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/MySqlService.cs

  • Add missing braces for exception handling block
  • Filter out empty cells when parsing sheet headers
  • Improve header column extraction with null/whitespace check
+3/-1     
Configuration changes
Using.cs
Add Helpers namespace to global usings                                     

src/Plugins/BotSharp.Plugin.ExcelHandler/Using.cs

  • Add global using statement for ExcelHandler.Helpers namespace
+1/-0     
BotSharp.sln
Fix solution file project configuration                                   

BotSharp.sln

  • Fix project folder reference for MMPEmbedding plugin
  • Add missing EndProject tag for proper solution structure
+2/-1     

@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New file parsing and database write operations were added without visible audit logging of
who performed actions, what files were processed, or outcomes.

Referred Code
        string extension = Path.GetExtension(file.FileStorageUrl);
        if (!_excelFileTypes.Contains(extension))
        {
            continue;
        }

        var binary = _fileStorage.GetFileBytes(file.FileStorageUrl);
        var workbook = ConvertToWorkbook(binary, extension);

        var currentCommands = _dbService.WriteExcelDataToDB(workbook);
        sqlCommands.AddRange(currentCommands);
    }
    return sqlCommands;
}

private string GenerateSqlExecutionSummary(List<SqlContextOut> results)
{
    var stringBuilder = new StringBuilder();
    if (results.Any(x => x.isSuccessful))
    {


 ... (clipped 38 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Edge cases unhandled: CSV parsing lacks handling for empty files, varying delimiters, BOMs, and row length
mismatches, which may cause silent data issues.

Referred Code
internal static IWorkbook ConvertCsvToWorkbook(byte[] bytes)
{
    IWorkbook workbook = new XSSFWorkbook();
    ISheet sheet = workbook.CreateSheet("Sheet1");

    using var memoryStream = new MemoryStream(bytes);
    using var reader = new StreamReader(memoryStream);

    int rowIndex = 0;
    string? line;

    while ((line = reader.ReadLine()) != null)
    {
        IRow row = sheet.CreateRow(rowIndex);
        var values = ParseCsvLine(line);

        for (int colIndex = 0; colIndex < values.Count; colIndex++)
        {
            ICell cell = row.CreateCell(colIndex);
            var value = values[colIndex];



 ... (clipped 15 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
No logging context: New parsing and DB write paths add processing of external files without structured logs
indicating file names, sizes, or sanitization decisions for safe observability.

Referred Code
        string extension = Path.GetExtension(file.FileStorageUrl);
        if (!_excelFileTypes.Contains(extension))
        {
            continue;
        }

        var binary = _fileStorage.GetFileBytes(file.FileStorageUrl);
        var workbook = ConvertToWorkbook(binary, extension);

        var currentCommands = _dbService.WriteExcelDataToDB(workbook);
        sqlCommands.AddRange(currentCommands);
    }
    return sqlCommands;
}

private string GenerateSqlExecutionSummary(List<SqlContextOut> results)
{
    var stringBuilder = new StringBuilder();
    if (results.Any(x => x.isSuccessful))
    {


 ... (clipped 38 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input validation gaps: CSV/XLS/XLSX bytes are parsed without visible validation of size, encoding, delimiter, or
malicious content, and headers are used to generate column names without normalization
checks.

Referred Code
private IWorkbook ConvertToWorkbook(BinaryData binary, string extension)
{
    var bytes = binary.ToArray();

    if (extension.IsEqualTo(".csv"))
    {
        return ExcelHelper.ConvertCsvToWorkbook(bytes);
    }

    using var fileStream = new MemoryStream(bytes);
    if (extension.IsEqualTo(".xls"))
    {
        return new HSSFWorkbook(fileStream);
    }

    return new XSSFWorkbook(fileStream);
}

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Replace custom CSV parser with library

Replace the custom-built CSV parsing logic with a dedicated and robust library
like CsvHelper. This will better handle complex CSV edge cases and simplify the
codebase.

Examples:

src/Plugins/BotSharp.Plugin.ExcelHandler/Helpers/ExcelHelper.cs [46-81]
    private static List<string> ParseCsvLine(string line)
    {
        var values = new List<string>();
        var currentValue = new StringBuilder();
        bool inQuotes = false;

        for (int i = 0; i < line.Length; i++)
        {
            char c = line[i];


 ... (clipped 26 lines)

Solution Walkthrough:

Before:

// in ExcelHelper.cs
internal static class ExcelHelper
{
    internal static IWorkbook ConvertCsvToWorkbook(byte[] bytes)
    {
        // ...
        while ((line = reader.ReadLine()) != null)
        {
            IRow row = sheet.CreateRow(rowIndex);
            var values = ParseCsvLine(line);
            // ... create cells from values
            rowIndex++;
        }
        return workbook;
    }

    private static List<string> ParseCsvLine(string line)
    {
        // Manual, character-by-character parsing logic
        // to handle commas and quotes.
    }
}

After:

// in ExcelHelper.cs
// (Requires adding CsvHelper library)
using CsvHelper;
using CsvHelper.Configuration;

internal static class ExcelHelper
{
    internal static IWorkbook ConvertCsvToWorkbook(byte[] bytes)
    {
        IWorkbook workbook = new XSSFWorkbook();
        ISheet sheet = workbook.CreateSheet("Sheet1");
        
        using var reader = new StreamReader(new MemoryStream(bytes));
        using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);

        // Read header
        csv.Read();
        csv.ReadHeader();
        // ... create header row ...

        // Read records
        while (csv.Read())
        {
            // ... create data row from csv.GetRecord<dynamic>() or by index ...
        }
        return workbook;
    }
}
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that the custom CSV parser in ExcelHelper.cs is a potential source of bugs and recommends using a standard library, which is a best practice for improving robustness and maintainability.

Medium
Possible issue
Fix incorrect parsing of multiline CSVs

Refactor the CSV parsing logic in ConvertCsvToWorkbook to correctly handle
multiline fields. This involves reading the entire file content and then
splitting it into lines while respecting quoted sections, rather than using
reader.ReadLine().

src/Plugins/BotSharp.Plugin.ExcelHandler/Helpers/ExcelHelper.cs [9-44]

 internal static IWorkbook ConvertCsvToWorkbook(byte[] bytes)
 {
     IWorkbook workbook = new XSSFWorkbook();
     ISheet sheet = workbook.CreateSheet("Sheet1");
 
     using var memoryStream = new MemoryStream(bytes);
     using var reader = new StreamReader(memoryStream);
+    var content = reader.ReadToEnd();
+    var lines = SplitCsv(content);
 
     int rowIndex = 0;
-    string? line;
-
-    while ((line = reader.ReadLine()) != null)
+    foreach (var line in lines)
     {
         IRow row = sheet.CreateRow(rowIndex);
         var values = ParseCsvLine(line);
 
         for (int colIndex = 0; colIndex < values.Count; colIndex++)
         {
             ICell cell = row.CreateCell(colIndex);
             var value = values[colIndex];
 
             if (rowIndex > 0 && double.TryParse(value, NumberStyles.Any, CultureInfo.InvariantCulture, out double numericValue))
             {
                 cell.SetCellValue(numericValue);
             }
             else
             {
                 cell.SetCellValue(value);
             }
         }
-
         rowIndex++;
     }
 
     return workbook;
 }
 
+private static List<string> SplitCsv(string csvText)
+{
+    var lines = new List<string>();
+    var currentLine = new StringBuilder();
+    bool inQuotes = false;
+
+    foreach (char c in csvText)
+    {
+        if (c == '"')
+        {
+            inQuotes = !inQuotes;
+        }
+
+        if (c == '\n' && !inQuotes)
+        {
+            if (currentLine.Length > 0 && currentLine[currentLine.Length - 1] == '\r')
+            {
+                currentLine.Remove(currentLine.Length - 1, 1);
+            }
+            lines.Add(currentLine.ToString());
+            currentLine.Clear();
+        }
+        else
+        {
+            currentLine.Append(c);
+        }
+    }
+
+    if (currentLine.Length > 0)
+    {
+        lines.Add(currentLine.ToString());
+    }
+
+    return lines;
+}
+
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a significant flaw in the new CSV parsing logic where multiline fields would be parsed incorrectly, leading to data corruption. The proposed solution is a valid approach to make the new feature more robust.

Medium
Prevent null reference on missing header

In the ParseSheetColumn method, add a null check for the headerRow variable
after it is assigned using sheet.GetRow(0). This will prevent a
NullReferenceException if the header row is missing.

src/Plugins/BotSharp.Plugin.ExcelHandler/Services/MySqlService.cs [162-174]

 private List<string> ParseSheetColumn(ISheet sheet)
 {
     if (sheet.PhysicalNumberOfRows < 2)
     {
         throw new Exception("No data found in the excel file");
     }
 
     _excelRowSize = sheet.PhysicalNumberOfRows - 1;
     var headerRow = sheet.GetRow(0);
+    if (headerRow == null)
+    {
+        throw new Exception("Header row not found in the excel file.");
+    }
     var headerColumn = headerRow.Cells.Where(x => !string.IsNullOrWhiteSpace(x.StringCellValue)).Select(x => x.StringCellValue.Replace(" ", "_")).ToList();
     _excelColumnSize = headerColumn.Count;
     return headerColumn;
 }
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential NullReferenceException if the header row is missing and provides a simple, effective fix. This improves the method's robustness against malformed Excel files.

Low
Learned
best practice
Validate extension and content

Normalize and validate extension and binary before use to avoid null/empty
issues and mismatched comparisons.

src/Plugins/BotSharp.Plugin.ExcelHandler/Functions/ReadExcelFn.cs [131-138]

-string extension = Path.GetExtension(file.FileStorageUrl);
+var extension = Path.GetExtension(file.FileStorageUrl) ?? string.Empty;
+if (string.IsNullOrWhiteSpace(extension))
+{
+    continue;
+}
+extension = extension.Trim().ToLowerInvariant();
 if (!_excelFileTypes.Contains(extension))
 {
     continue;
 }
 
 var binary = _fileStorage.GetFileBytes(file.FileStorageUrl);
+if (binary == null || binary.Length == 0)
+{
+    continue;
+}
 var workbook = ConvertToWorkbook(binary, extension);

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Guard against nulls and invalid states before access to avoid NullReferenceExceptions.

Low
  • More

@iceljc iceljc merged commit 5a7a6f2 into SciSharp:master Dec 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant