Description
This would be useful, for example, when I really want to be able to use a design matrix as both a raw numpy array and a pandas dataframe.
I suppose I could specify return_type="dataframe"
and then get the numpy array from df.values, and it's also not hard to build the dataframe from scratch, but this would be particularly handy for interactive use, where it would provide a useful shortcut (e.g., X.to_dataframe().plot()
or X.to_dataframe().head()
).
To do this right, the new method would be factored out of build_design_matrices. Roughly speaking, it would look like this:
def to_dataframe(self):
if not have_pandas:
raise PatsyError("pandas.DataFrame was requested, but "
"pandas is not installed")
di = self.design_info
df = pandas.DataFrame(self, columns=di.column_names,
index=di.pandas_index)
df.design_info = di
return df
The main design change would be that DesignInfo (or DesignMatrix) would need to gain a pandas_index
attribute, which would keep track of any index from the original data.
If this seems reasonable, I could put together a pull request.