Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(derived) create_categorical(..., multiple=True) assumes same base #196

Closed
jamesrkg opened this issue Nov 2, 2017 · 8 comments
Closed
Assignees

Comments

@jamesrkg
Copy link

jamesrkg commented Nov 2, 2017

create_categorical(..., multiple=True) returns undesired results in two use cases. This is because multiple=True results in the new variable always having the full base of the dataset.

  1. Often we want to rebase an existing variable. To do this we create a copy of it with some of its categories left out. However, when we do this the base remains the same.

  2. Other times the source variable is only populated for a subset of the dataset rows. In this case it's not even possible to create a copy of it with the same base because the copy automatically ends up populated for all rows in the dataset.

@mathiasbc mathiasbc self-assigned this Nov 7, 2017
@mathiasbc
Copy link
Contributor

@jamesrkg: I would need an example of this.

@jamesrkg
Copy link
Author

@mathiasbc finally got around to sending you something on this!

@mathiasbc
Copy link
Contributor

mathiasbc commented Nov 22, 2017

This is happening when creating a MR from a categorical. The original variable is:

Qintend

Owner Intender   2230
NONR                  4259
Non-Owner R.     3617
Not Aware         23472
Owner Non Int.   1534
-----------------------
Valid: 35112   Missing: 129437

I managed to establish some sort of new base, based on the variables created but the trick was to add a variable that involves the rest of the excluded variables:

new_var = ds.create_categorical(
    alias='test-m3',
    name='MR TEST3',
    categories=[
        {'id': 1, 'name': ds['Qintend'].categories[1].name, 'case': 'Qintend == 1'},
        {'id': -1, 'name': 'Missing', 'case': 'Qintend not in [1]'}
    ],
    multiple=True
)

@jjdelc @malecki: The payload below is being sent to Crunch, but I didn't find anything in the docs for including missing values. What is happening here is that the newly created variable has no missing values.

{
    "element": "shoji:entity",
    "body": {
        "name": "MR TEST3",
        "alias": "test-m3",
        "description": "",
        "notes": "",
        "derivation": {
            "function": "array",
            "args": [
                {
                    "function": "select",
                    "args": [
                        {
                            "map": {
                                "0001": {
                                    "references": {
                                        "name": "Owner Intender",
                                        "alias": "test-m3_1"
                                    },
                                    "function": "case",
                                    "args": [
                                        {
                                            "column": [
                                                1,
                                                2
                                            ],
                                            "type": {
                                                "value": {
                                                    "class": "categorical",
                                                    "categories": [
                                                        {
                                                            "id": 1,
                                                            "name": "Selected",
                                                            "missing": false,
                                                            "numeric_value": null,
                                                            "selected": true
                                                        },
                                                        {
                                                            "id": 2,
                                                            "name": "Not selected",
                                                            "missing": false,
                                                            "numeric_value": null,
                                                            "selected": false
                                                        }
                                                    ]
                                                }
                                            }
                                        },
                                        {
                                            "function": "==",
                                            "args": [
                                                {
                                                    "variable": "https://alpha.crunch.io/api/datasets/aafc79447fef4f98a9b0cb64a441e842/variables/000004/"
                                                },
                                                {
                                                    "value": 1
                                                }
                                            ]
                                        }
                                    ]
                                },
                                "-001": {
                                    "references": {
                                        "name": "Missing",
                                        "alias": "test-m3_-1"
                                    },
                                    "function": "case",
                                    "args": [
                                        {
                                            "column": [
                                                1,
                                                2
                                            ],
                                            "type": {
                                                "value": {
                                                    "class": "categorical",
                                                    "categories": [
                                                        {
                                                            "id": 1,
                                                            "name": "Selected",
                                                            "missing": false,
                                                            "numeric_value": null,
                                                            "selected": true
                                                        },
                                                        {
                                                            "id": 2,
                                                            "name": "Not selected",
                                                            "missing": false,
                                                            "numeric_value": null,
                                                            "selected": false
                                                        }
                                                    ]
                                                }
                                            }
                                        },
                                        {
                                            "function": "not",
                                            "args": [
                                                {
                                                    "function": "in",
                                                    "args": [
                                                        {
                                                            "variable": "https://alpha.crunch.io/api/datasets/aafc79447fef4f98a9b0cb64a441e842/variables/000004/"
                                                        },
                                                        {
                                                            "value": [
                                                                1
                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            }
                        },
                        {
                            "value": [
                                "0001",
                                "-001"
                            ]
                        }
                    ]
                }
            ]
        }
    }
}

Which created the following variable:
MR TEST3

Owner Intender    2230
Missing              162319
-----------------------
Valid: 164549  Missing:0

What we are trying to achieve here is some sort of recode, and leave some of the variables out but the new variable should be based on the included subvariables only.

@xbito xbito assigned jjdelc and unassigned mathiasbc Nov 27, 2017
@xbito
Copy link
Contributor

xbito commented Nov 27, 2017

Changing the assignment to @jjdelc we are waiting on them for guidance on how we could flag a missing in this case, or have another kind of workaround (changing the base?)

@jamesrkg jamesrkg added this to the Wishlist milestone Dec 5, 2017
@jamesrkg jamesrkg changed the title create_categorical(..., multiple=True) assumes same base (derived) create_categorical(..., multiple=True) assumes same base Dec 5, 2017
@jamesrkg
Copy link
Author

jamesrkg commented Jan 4, 2018

@xbito @jjdelc this one quite urgently needs to be fixed now - can we bump it again?

@jamesrkg
Copy link
Author

It looks like this might not be possible and we'll need to instead focus on #205.

@jamesrkg
Copy link
Author

jamesrkg commented Feb 7, 2018

This ticket is now being tackled in #228.

@jamesrkg
Copy link
Author

jamesrkg commented Feb 7, 2018

I'm closing this one. The focus needs to stay on #228.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants