Skip to content

DesignMatrix should have to_dataframe() method #30

Open
@shoyer

Description

@shoyer

This would be useful, for example, when I really want to be able to use a design matrix as both a raw numpy array and a pandas dataframe.

I suppose I could specify return_type="dataframe" and then get the numpy array from df.values, and it's also not hard to build the dataframe from scratch, but this would be particularly handy for interactive use, where it would provide a useful shortcut (e.g., X.to_dataframe().plot() or X.to_dataframe().head()).

To do this right, the new method would be factored out of build_design_matrices. Roughly speaking, it would look like this:

def to_dataframe(self):
    if not have_pandas:
        raise PatsyError("pandas.DataFrame was requested, but "
                         "pandas is not installed")
    di = self.design_info
    df = pandas.DataFrame(self, columns=di.column_names,
                          index=di.pandas_index)
    df.design_info = di
    return df

The main design change would be that DesignInfo (or DesignMatrix) would need to gain a pandas_index attribute, which would keep track of any index from the original data.

If this seems reasonable, I could put together a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions