Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating derived multiple_response with control over base #228

Closed
jamesrkg opened this issue Jan 17, 2018 · 3 comments
Closed

Creating derived multiple_response with control over base #228

jamesrkg opened this issue Jan 17, 2018 · 3 comments

Comments

@jamesrkg
Copy link

jamesrkg commented Jan 17, 2018

This ticket is the result of testing for ways around the behavior of ds.create_categorical(..., multiple=True) described in #196. To summarize briefly, that behavior is that the resulting variable always has the full base of the entire dataset, regardless of the way it is constructed, the way the case statements are written, or the way in which the source variable itself is populated.

The categories given to a new multiple_response using this method are:

CategoryList(
    [
        (1, Category(numeric_value=None, selected=True, id=1, missing=False, name=Selected)), 
        (2, Category(numeric_value=None, selected=False, id=2, missing=False, name=Not selected))
    ]
)

The result always having a base-all is due to both of these categories being defined with missing=False.

It's possible to get a non-all base on the new multiple_response created this way by following up with:

new_var.integrate()
new_var.categories[2].edit(missing=True)

However, this only gives the ability to get a base equal to the rows for which any of these subvariables are selected, meaning that it's not possible to have both some "Not selected" and others "Missing" at the same time.

It's necessary to integrate the variable first because we can't yet edit missing on derived variables (see #148).

There are a few things that might need to be done to address this.

1.

Firstly, the default categories given to a new multiple_response using this method should probably be:

CategoryList(
    [
        (1, Category(numeric_value=None, selected=True, id=1, missing=False, name=Selected)), 
        (2, Category(numeric_value=None, selected=False, id=2, missing=False, name=Not selected)),
        (-1, Category(numeric_value=None, selected=False, id=-1, missing=True, name=No Data))
    ]
) 

This will protect the ability to make the important distinction between "Not selected" and "Missing" when necessary.

2.

ds.create_categorical(..., multiple=True) should become the 'simple' use case version of this request where the:

  • explicit case statement=True rows are stamped with a 1
  • implied case statement=False rows (those populated in the source variable but not matching the case statement) are stamped with 2
  • remaining rows (those not populated in the source variable) are stamped with -1

3.

A new method for fully explicit control over the new variable is provided. API for this to be discussed/defined.

@jamesrkg
Copy link
Author

@xbito @jjdelc @mathiasbc

@jamesrkg jamesrkg changed the title Creating derived multiple_response with explicit subvariables/categories Creating derived multiple_response with control over base Jan 17, 2018
@jamesrkg jamesrkg added this to the Wishlist milestone Jan 22, 2018
@jamesrkg
Copy link
Author

jamesrkg commented Aug 7, 2018

@xbito @jjdelc can we talk about this one because it prevents us from using derived variables in a lot of situations that most warrant it. However, I'm not sure if this is wholly an issue that scrunch on its own can solve or not.

@jamesrkg
Copy link
Author

Resolved by #286.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant