-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(derived) create_categorical(..., multiple=True) assumes same base #196
Comments
@jamesrkg: I would need an example of this. |
@mathiasbc finally got around to sending you something on this! |
This is happening when creating a MR from a categorical. The original variable is: Qintend
I managed to establish some sort of new base, based on the variables created but the trick was to add a variable that involves the rest of the excluded variables: new_var = ds.create_categorical(
alias='test-m3',
name='MR TEST3',
categories=[
{'id': 1, 'name': ds['Qintend'].categories[1].name, 'case': 'Qintend == 1'},
{'id': -1, 'name': 'Missing', 'case': 'Qintend not in [1]'}
],
multiple=True
) @jjdelc @malecki: The payload below is being sent to Crunch, but I didn't find anything in the docs for including missing values. What is happening here is that the newly created variable has no missing values. {
"element": "shoji:entity",
"body": {
"name": "MR TEST3",
"alias": "test-m3",
"description": "",
"notes": "",
"derivation": {
"function": "array",
"args": [
{
"function": "select",
"args": [
{
"map": {
"0001": {
"references": {
"name": "Owner Intender",
"alias": "test-m3_1"
},
"function": "case",
"args": [
{
"column": [
1,
2
],
"type": {
"value": {
"class": "categorical",
"categories": [
{
"id": 1,
"name": "Selected",
"missing": false,
"numeric_value": null,
"selected": true
},
{
"id": 2,
"name": "Not selected",
"missing": false,
"numeric_value": null,
"selected": false
}
]
}
}
},
{
"function": "==",
"args": [
{
"variable": "https://alpha.crunch.io/api/datasets/aafc79447fef4f98a9b0cb64a441e842/variables/000004/"
},
{
"value": 1
}
]
}
]
},
"-001": {
"references": {
"name": "Missing",
"alias": "test-m3_-1"
},
"function": "case",
"args": [
{
"column": [
1,
2
],
"type": {
"value": {
"class": "categorical",
"categories": [
{
"id": 1,
"name": "Selected",
"missing": false,
"numeric_value": null,
"selected": true
},
{
"id": 2,
"name": "Not selected",
"missing": false,
"numeric_value": null,
"selected": false
}
]
}
}
},
{
"function": "not",
"args": [
{
"function": "in",
"args": [
{
"variable": "https://alpha.crunch.io/api/datasets/aafc79447fef4f98a9b0cb64a441e842/variables/000004/"
},
{
"value": [
1
]
}
]
}
]
}
]
}
}
},
{
"value": [
"0001",
"-001"
]
}
]
}
]
}
}
} Which created the following variable:
What we are trying to achieve here is some sort of recode, and leave some of the variables out but the new variable should be based on the included subvariables only. |
Changing the assignment to @jjdelc we are waiting on them for guidance on how we could flag a missing in this case, or have another kind of workaround (changing the base?) |
It looks like this might not be possible and we'll need to instead focus on #205. |
This ticket is now being tackled in #228. |
I'm closing this one. The focus needs to stay on #228. |
create_categorical(..., multiple=True)
returns undesired results in two use cases. This is becausemultiple=True
results in the new variable always having the full base of the dataset.Often we want to rebase an existing variable. To do this we create a copy of it with some of its categories left out. However, when we do this the base remains the same.
Other times the source variable is only populated for a subset of the dataset rows. In this case it's not even possible to create a copy of it with the same base because the copy automatically ends up populated for all rows in the dataset.
The text was updated successfully, but these errors were encountered: