-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
The documentation for pandas.read_csv(usecols=[...]) says that it treats the iterable list of columns like an unordered set (updated in #18673 and #53763), so the returned dataframe won't necessarily have the same column order. This is different behaviour from other pandas data reading methods (e.g., pandas.read_parquet(columns=[...])). I think the order should be preserved. If usecols is converted to a set, I think it should instead be converted to OrderedSet or keys of collections.OrderedDict (or just dict in Python >3.6).
Feature Description
import pandas as pd
# Example CSV file (replace with your actual file)
csv_data = """
col1,col2,col3,col4
A,1,X,10
B,2,Y,20
C,3,Z,30
"""
with open("example.csv", "w") as f:
f.write(csv_data)
# Desired column order
desired_order = ['col3', 'col1', 'col4']
# Read CSV with usecols (selects columns but doesn't order)
df = pd.read_csv("example.csv", usecols=desired_order)
print(df) # incorrect column order
# Reindex DataFrame to enforce desired order (a popular workaround that I think shouldn't be required)
# One solution is to include this line in `read_csv`, when using `usecols` kwarg
df = df[desired_order]
print(df) # correct column orderAlternative Solutions
Instead of converting usecols to set, convert it to dict.keys() which preserved order in Python >3.6
Additional Context
No response