Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DesignMatrix should have to_dataframe() method #30

Open
shoyer opened this issue Oct 30, 2013 · 4 comments
Open

DesignMatrix should have to_dataframe() method #30

shoyer opened this issue Oct 30, 2013 · 4 comments

Comments

@shoyer
Copy link
Member

shoyer commented Oct 30, 2013

This would be useful, for example, when I really want to be able to use a design matrix as both a raw numpy array and a pandas dataframe.

I suppose I could specify return_type="dataframe" and then get the numpy array from df.values, and it's also not hard to build the dataframe from scratch, but this would be particularly handy for interactive use, where it would provide a useful shortcut (e.g., X.to_dataframe().plot() or X.to_dataframe().head()).

To do this right, the new method would be factored out of build_design_matrices. Roughly speaking, it would look like this:

def to_dataframe(self):
    if not have_pandas:
        raise PatsyError("pandas.DataFrame was requested, but "
                         "pandas is not installed")
    di = self.design_info
    df = pandas.DataFrame(self, columns=di.column_names,
                          index=di.pandas_index)
    df.design_info = di
    return df

The main design change would be that DesignInfo (or DesignMatrix) would need to gain a pandas_index attribute, which would keep track of any index from the original data.

If this seems reasonable, I could put together a pull request.

@jseabold
Copy link
Member

jseabold commented May 6, 2014

In principal, I agree with the sentiment. I'm not sure I agree with the design you've proposed, but if you hand off a pandas object to patsy, I think it should be trivial to get one back at some point even if you don't specify return_type="dataframe". AFAICT, this isn't possible right now.

@kyleabeauchamp
Copy link

I also think something like this might be useful for keeping track of pandas metadata for future use.

@njsmith
Copy link
Member

njsmith commented Apr 14, 2015

Sorry for missing this. Seems reasonable to me.

@jpweytjens
Copy link

Currently, a DesignMatrix doesn't store the original pandas index. If you still have the original pandas DataFrame that was used to create the DesignMatrix, you can use it's index.

def designmatrix_to_pandas(matrix, df):
    return pd.DataFrame(matrix, columns=matrix.design_info.column_names).set_index(
        df.index
    )

But I agree with @shoyer, it would be better if the design_info would keep a copy of the original index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants