Skip to content

ENH: Include line number and number of fields when read_csv() callable raises ParserWarning #61838

@matthewgottlieb

Description

@matthewgottlieb

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use pandas to detect and repair issues in a CSV file, but raise an informative warning when an unrepairable issue is encountered.

I have written a function which identifies common issues (e.g. the field delimiter being improperly used within a field) and checks surrounding fields to estimate the original intent of the data, but when the issue cannot be identified with this logic, the function would return the original line and the user should be directed to the problematic line.

Feature Description

Given a CSV with bad lines (e.g. line 3 having an extra "E"):

id,field_1,field_2
101,A,B
102,C,D,E
103,F,G

read_csv() will, with all defaults (on_bad_lines='error'), raise a ParserError:

pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

With on_bad_lines='warn', it will raise a ParserWarning, with the same helpful information:

<stdin>:1: ParserWarning: Skipping line 3: expected 3 fields, saw 4

However, when a using a callable (e.g. on_bad_lines=line_fixer), the ParserWarning message is very generic, not indicating the line number, expected fields, nor seen fields:

>>> import pandas as pd
>>> def line_fixer(line):
...     return [1, 2, 3, 4, 5]
...
>>> df = pd.read_csv('test.csv', engine='python', on_bad_lines=line_fixer)
<stdin>:1: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.

Including these details would allow the user to find and fix the input CSV manually.

Alternative Solutions

  • Pre-process the CSV file separately from the read_csv() function.
  • Pass line number and expected field count to the callable function, which can raise its own descriptive warning.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

EnhancementError ReportingIncorrect or improved errors from pandasNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions