You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for pandas.read_csv(usecols=[...]) says that it treats the iterable list of columns like an unordered set (updated in #18673 and #53763), so the returned dataframe won't necessarily have the same column order. This is different behaviour from other pandas data reading methods (e.g., pandas.read_parquet(columns=[...])). I think the order should be preserved. If usecols is converted to a set, I think it should instead be converted to OrderedSet or keys of collections.OrderedDict (or just dict in Python >3.6).
Feature Description
importpandasaspd# Example CSV file (replace with your actual file)csv_data="""col1,col2,col3,col4A,1,X,10B,2,Y,20C,3,Z,30"""withopen("example.csv", "w") asf:
f.write(csv_data)
# Desired column orderdesired_order= ['col3', 'col1', 'col4']
# Read CSV with usecols (selects columns but doesn't order)df=pd.read_csv("example.csv", usecols=desired_order)
print(df) # incorrect column order# Reindex DataFrame to enforce desired order (a popular workaround that I think shouldn't be required)# One solution is to include this line in `read_csv`, when using `usecols` kwargdf=df[desired_order]
print(df) # correct column order
Alternative Solutions
Instead of converting usecols to set, convert it to dict.keys() which preserved order in Python >3.6
Additional Context
No response
The text was updated successfully, but these errors were encountered:
@mroeschke could I get your opinion on this before I dig deeper into it? You were the last person to work with the function (_validate_usecols_arg) and I'm mainly worried about backwards compatibility rather than feasibility. But considering that pandas is having a major version update, it could be justifiable.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
The documentation for
pandas.read_csv(usecols=[...])
says that it treats the iterable list of columns like an unordered set (updated in #18673 and #53763), so the returned dataframe won't necessarily have the same column order. This is different behaviour from other pandas data reading methods (e.g.,pandas.read_parquet(columns=[...])
). I think the order should be preserved. Ifusecols
is converted to aset
, I think it should instead be converted toOrderedSet
or keys ofcollections.OrderedDict
(or justdict
in Python >3.6).Feature Description
Alternative Solutions
Instead of converting
usecols
toset
, convert it todict.keys()
which preserved order in Python >3.6Additional Context
No response
The text was updated successfully, but these errors were encountered: