-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting No Data in derived multiple_response for create_categorical(..., multiple=True) #286
Comments
Here's a test using pycrunch (the This example creates a multiple response variable using 3 categories, Yes/No/Missing. Note that all subvariables need to specify the same categories (but each could have different conditions)
|
@mathiasbc following Jj payload example, my intention is to do this in 2 steps: First, we need a new method to build this kind of derived multiples, that can have an X number of categories (only 1 chosen as selected) and with the ability to mark some as missing, others not. And for each subvariable be able to pass the adequate expressions to generate them. Second, once that new helper is in place, modify create_categorical to enable the common use case of generating 3 categories, 1 Selected, 2 Not Selected, 3 Missing. With the ability to specify the missing case per subvariable or globally for the entire variable (that's the parameter that Jamie has named base, I would prefer to use something more like missing_case, though I'm open to suggestions). |
The first step will be to add a flexible method that allows deriving multiple responses in this proposed format: desireable_kwargs = {
'name': 'derived1',
'alias': 'derived1',
'description': 'Multiple response derived',
# categories must have one and only 1 as selected=True
'categories': [
{'id': 1, 'name': 'Yes', 'missing': False, 'selected': True},
{'id': 2, 'name': 'No', 'missing': False},
{'id': 3, 'name': 'Maybe', 'missing': False},
{'id': 4, 'name': 'Missing', 'missing': True}
],
'responses': [
{
'name': 'Subvar 1',
'id': 1,
'cases': ['var_1 < 20', 'var_1 == 20', 'var_1 == 30', 'var_1 > 30']
}
{
'name': 'Subvar 2',
'id': 2,
'cases': ['var_2 < 2', 'var_2 == 2', 'var_2 == 3', 'var_2 > 3']
}
{
'name': 'Subvar 3',
'id': 3,
'cases': ['var_3 in [1]', 'var_3 in [2]', 'var_3 in [3]', 'var_3 in [4]']
}
# ... Define as many subvariables as needed
]
}
ds.derive_multiple_response(**desireable_kwargs) the Let me know if I'm on the right track or if you have a better approach. |
Some thoughts/suggestions:
Given the above I've adapted the example above to: desireable_kwargs = {
'name': 'derived1',
'alias': 'derived1',
'description': 'Multiple response derived',
'notes': 'Special variable',
# categories must have one and only 1 as selected=True
'categories': [
{'id': 1, 'name': 'Yes', 'selected': True},
{'id': 2, 'name': 'No'},
{'id': 3, 'name': 'Maybe'},
{'id': 4, 'name': 'Missing', 'missing': True}
],
'subvariables': [
{
'alias': 'Subvar_1',
'name': 'Subvar 1',
'cases': {
1: 'var_1 < 20',
2: 'var_1 == 20',
3: 'var_1 == 30',
4: 'var_1 > 30'
}
},
{
'alias': 'Subvar_3',
'name': 'Subvar 2',
'cases': {
1: 'var_2 < 2',
2: 'var_2 == 2',
3: 'var_2 == 3',
4: 'var_2 > 3'
}
},
{
'alias': 'Subvar_3',
'name': 'Subvar 3',
'cases': {
1: 'var_3 in [1]',
2: 'var_3 in [2]',
3: 'var_3 in [3]',
4: 'var_3 in [4]'
}
}
# ... Define as many subvariables as needed
]
} |
On the list that Jamie made:
|
Thanks @xbito.
|
@jamesrkg I added a Pull Request with the code: #290 Please pull that branch and test that you get what you are expecting so I can write some tests and have it ready to merge. An example:
I noted that the created derived variable adds a category with -1: No Data which makes me doubt. I changed the |
There is a new PR that integrates |
I can't quite get this to produce the result I'm after. One problem I think I have already commented on here: https://github.com/Crunch-io/scrunch/pull/293/files#r218187826 But I'll also clarify the requirements because at the top of this ticket I proposed a Three use cases need to be catered for:
1.User only wants to give the test_m = ds.create_categorical(
alias='drinks',
name='Preferred drinks',
description='Which drinks do you prefer?',
multiple=True,
categories=[
{'id': 1, 'name': 'Sub1', 'case': 'q1 in [1]'},
{'id': 2, 'name': 'Sub2', 'case': 'q1 in [2]'},
{'id': 3, 'name': 'Sub3', 'case': 'q1 in [95]'},
{'id': 4, 'name': 'Sub4', 'case': 'q1 in [99]'},
{'id': 5, 'name': 'Sub5', 'case': 'q1 in [1,2]'}
]
) Which should yield the following subvariables=[
{
'id': 1,
'name': 'Sub1',
'cases': {
1: 'q1 in [1]',
2: 'not q1 in [1]'
}
},
{
'id': 2,
'name': 'Sub2',
'cases': {
1: 'q1 in [2]',
2: 'not q1 in [2]'
}
},
{
'id': 3,
'name': 'Sub3',
'cases': {
1: 'q1 in [95]',
2: 'not q1 in [95]'
}
},
{
'id': 4,
'name': 'Sub4',
'cases': {
1: 'q1 in [99]',
2: 'not q1 in [99]'
}
},
{
'id': 5,
'name': 'Sub5',
'cases': {
1: 'q1 in [1,2]',
2: 'not q1 in [1,2]'
}
}
] 2.User wants to give the test_m = ds.create_categorical(
alias='test_multi',
name='test-multi',
multiple=True,
missing='missing(q1)',
categories=[
{'id': 1, 'name': 'Sub1', 'case': 'q1 in [1]'},
{'id': 2, 'name': 'Sub2', 'case': 'q1 in [2]'},
{'id': 3, 'name': 'Sub3', 'case': 'q1 in [95]'},
{'id': 4, 'name': 'Sub4', 'case': 'q1 in [99]'},
{'id': 5, 'name': 'Sub5', 'case': 'q1 in [1,2]'}
]
) Which should yield the following subvariables=[
{
'id': 1,
'name': 'Sub1',
'cases': {
1: 'q1 in [1]',
2: 'not q1 in [1]',
3: 'missing(q1)'
}
},
{
'id': 2,
'name': 'Sub2',
'cases': {
1: 'q1 in [2]',
2: 'not q1 in [2]',
3: 'missing(q1)'
}
},
{
'id': 3,
'name': 'Sub3',
'cases': {
1: 'q1 in [95]',
2: 'not q1 in [95]',
3: 'missing(q1)'
}
},
{
'id': 4,
'name': 'Sub4',
'cases': {
1: 'q1 in [99]',
2: 'not q1 in [99]',
3: 'missing(q1)'
}
},
{
'id': 5,
'name': 'Sub5',
'cases': {
1: 'q1 in [1,2]',
2: 'not q1 in [1,2]',
3: 'missing(q1)'
}
}
] 3.User needs to give explicit test_m = ds.create_categorical(
alias='test_multi',
name='test-multi',
multiple=True,
categories=[
{'id': 1, 'name': 'Sub1', 'case': 'q1 in [1,2]', 'missing': 'missing(q1)'},
{'id': 2, 'name': 'Sub2', 'case': 'q2 in [1,2]', 'missing': 'missing(q2)'},
{'id': 3, 'name': 'Sub3', 'case': 'q3 in [1,2]', 'missing': 'missing(q3)'},
{'id': 4, 'name': 'Sub4', 'case': 'q4 in [1,2]', 'missing': 'missing(q4)'},
{'id': 5, 'name': 'Sub5', 'case': 'q5 in [1,2]', 'missing': 'missing(q5)'}
]
) Which should yield the following subvariables=[
{
'id': 1,
'name': 'Sub1',
'cases': {
1: 'q1 in [1,2]',
2: 'not q1 in [1,2]',
3: 'missing(q1)'
}
},
{
'id': 2,
'name': 'Sub2',
'cases': {
1: 'q2 in [1,2]',
2: 'not q2 in [1,2]',
3: 'missing(q2)'
}
},
{
'id': 3,
'name': 'Sub3',
'cases': {
1: 'q3 in [1,2]',
2: 'not q3 in [1,2]',
3: 'missing(q3)'
}
},
{
'id': 4,
'name': 'Sub4',
'cases': {
1: 'q4 in [1,2]',
2: 'not q4 in [1,2]',
3: 'missing(q4)'
}
},
{
'id': 5,
'name': 'Sub5',
'cases': {
1: 'q5 in [1,2]',
2: 'not q5 in [1,2]',
3: 'missing(q5)'
}
}
] |
@jamesrkg: I added the Not Selected case and did some changes: https://github.com/Crunch-io/scrunch/pull/293/files#diff-10a14081413b0535e4d0097c2ad71a58R1498. Let me know if that works better for you. I added the |
Following lots of conversations about this (see #228 and #196) it looks like the following specific changes are needed to support the creation of derived
multiple_response
variables where the expected base is less than the total number of rows in the dataset.The following two changes are required:
Firstly, the default categories given to a new
multiple_response
using this method should be:Add a new (optional) key to the objects in the list given to
categories
(in the below example named "base"), that being an expression describing which cases are allowed to have something other thanNo Data
.These expressions would be evaluated as:
case
is True andbase
is True: 1case
is False andbase
is True: 2case
is True andbase
is False : -1case
is False andbase
is False : -1Possible extension of the above - where the
base
expression is the same for every subvariable add support for a new argbase
that is auto-applied to each:The text was updated successfully, but these errors were encountered: