[BUG] method = 'rake' return AttributeError #73

geniusjenny · 2024-03-05T04:29:19Z

Describe the bug

The same code has no error when running method ='ipw', and method = 'cbps', but return below error when using raking.
The below code return error

sample_with_target.adjust(method = "rake",variables = variables) 
table_current.loc[feature, weight_col]                      
AttributeError: 'numpy.int64' object has no attribute 'loc'

###Update on 2023/03/08###
This bug is returned because some of the bin that appears in the sample has never appeared in the target.
Once I add the sample to the target to make sure all bins appear in the target, the bug disappear.

Session information

Please run paste here the output of running the following in your notebook/terminal:

# Sessions info
import session_info
session_info.show(html=False, dependencies=True)

balance 0.9.1
balance_functions NA
boto3 1.28.28
dateutil 2.8.2
matplotlib 3.7.2
numpy 1.24.4
pandas 1.4.3
psutil 5.9.5
seaborn 0.12.2
session_info 1.0.0
tqdm 4.65.0

OpenSSL 23.2.0
PIL 10.0.0
anyio NA
arrow 1.2.3
asttokens NA
attr 23.1.0
attrs 23.1.0
babel 2.12.1
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
botocore 1.31.28
brotli NA
certifi 2023.05.07
cffi 1.15.1
charset_normalizer 3.2.0
cloudpickle 2.2.1
colorama 0.4.4
comm 0.1.3
coxnet NA
cryptography 41.0.2
cvcompute NA
cvelnet NA
cvfishnet NA
cvglmnet NA
cvglmnetCoef NA
cvglmnetPredict NA
cvlognet NA
cvmrelnet NA
cvmultnet NA
cycler 0.10.0
cython_runtime NA
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
elnet NA
executing 1.2.0
fastjsonschema NA
fishnet NA
fqdn NA
fsspec 2023.6.0
glmnet NA
glmnetCoef NA
glmnetControl NA
glmnetPredict NA
glmnetSet NA
glmnet_python NA
google NA
hypergeom_ufunc NA
idna 3.4
ipfn NA
ipykernel 6.24.0
ipython_genutils 0.2.0
ipywidgets 8.0.7
isoduration NA
jedi 0.18.2
jinja2 3.1.2
jmespath 1.0.1
joblib 1.3.1
json5 NA
jsonpointer 2.4
jsonschema 4.18.4
jsonschema_specifications NA
jupyter_events 0.6.3
jupyter_server 2.7.0
jupyterlab_server 2.23.0
kiwisolver 1.4.4
loadGlmLib NA
lognet NA
markupsafe 2.1.3
matplotlib_inline 0.1.6
mpl_toolkits NA
mrelnet NA
nbformat 5.9.1
nbinom_ufunc NA
ncf_ufunc NA
overrides NA
packaging 21.3
parso 0.8.3
patsy 0.5.3
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.9.1
plotly 5.15.0
prometheus_client NA
prompt_toolkit 3.0.39
ptyprocess 0.7.0
pure_eval 0.2.2
pyarrow 12.0.1
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.15.1
pyparsing 3.0.9
pythonjsonlogger NA
pytz 2023.3
referencing NA
requests 2.31.0
rfc3339_validator 0.1.4
rfc3986_validator 0.1.1
rpds NA
s3fs 0.4.2
scipy 1.9.1
send2trash NA
six 1.16.0
sklearn 1.3.0
sniffio 1.3.0
socks 1.7.1
stack_data 0.6.2
statsmodels 0.14.0
tenacity NA
threadpoolctl 3.2.0
tornado 6.3.2
traitlets 5.9.0
typing_extensions NA
uri_template NA
urllib3 1.26.14
wcwidth 0.2.6
webcolors 1.13
websocket 1.6.1
wtmean NA
yaml 6.0
zmq 25.1.0

IPython 8.14.0
jupyter_client 8.3.0
jupyter_core 5.3.1
jupyterlab 4.0.3
notebook 6.5.4

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Linux-5.10.209-198.812.amzn2.x86_64-x86_64-with-glibc2.26

Session information updated at 2024-03-05 04:21

Screenshots

If applicable, add screenshots to help explain your problem.

Reproducible example

Please provide us with (any that apply):

Code: code we can run to reproduce the issue (in terminal or python notebook)
sample = Sample.from_frame(sample_df2[:50]) target = Sample.from_frame(target_df2[:500]) sample_with_target = sample.set_target(target) adjusted_ads_weight = sample_with_target.adjust(method = "rake",variables = variables_subset2)
sample_df2 and target_df2 are dataframes with two numerical columns.
Reference: If the issue is in a tutorial, please provide the link to it, and the exact place in which the code fails.

Additional context

Add any other context about the problem here that might help us solve it.

The text was updated successfully, but these errors were encountered:

talgalili · 2024-03-05T14:43:57Z

Hey @geniusjenny

Thanks for the bug report!

Could you please try to run the code from the rake tutorial:
https://import-balance.org/docs/tutorials/quickstart_rake/
And see if you can reproduce the code from it?

What would help me is a fully self-contained reproducible example that I could run in my env to reproduce the error - that would allow me to more easily iterate to get a solution.

Thanks upfront!

geniusjenny · 2024-03-05T16:25:58Z

Thanks for the replies!
For the sample code it runs smoothly with no error.

talgalili · 2024-03-05T17:34:19Z

Thanks for checking @geniusjenny
Any way you could play around and try to find a way to reproduce the issue?
I suggest you look at the
sample.df.info()
And look at the data types, and maybe the hint could be there.

Once you could find a way to reproduce the issue, I'd be able to work on it.
WDYT?

geniusjenny · 2024-03-05T22:09:20Z

Hi talgalili, I tried to reproduce the issue but couldn't. I tried using two numerical features ['income', 'happiness'] similar with what I have for my dataset, and the code runs smoothly.
I attached the sample data here for you to reproduce the issue. Sorry that I couldn't be more helpful.

Thank you so much.
sample_test2.csv
target_test2.csv
code:

s2= pd.read_csv('sample_test2.csv',index_col=0)
t2= pd.read_csv('target_test2.csv',index_col=0)
sample = Sample.from_frame(s2)
target = Sample.from_frame(t2)
sample_with_target = sample.set_target(target)
adjusted_ads_weight1 = sample_with_target.adjust(method = "rake")

talgalili · 2024-03-06T09:12:43Z

Thanks @geniusjenny

Just to double check, could you please paste the full output of you running the above code?
And please also include the output of:
sample.df.info()
target.df.info()

Thanks!

geniusjenny · 2024-03-06T15:42:46Z

Sure!
Full output:

df.info:

talgalili · 2024-03-06T15:56:42Z

Thanks! Could you please try to bucket the variables and try again? I think rake should be defined on categorical variables and not numeric ones (how to correct it woth a default is a good question - but I'd like to double check that this is indeed the issue)

…

On Wed, 6 Mar 2024, 17:42 Han Wang, ***@***.***> wrote: Sure! Full output: image.8.png (view on web) <https://github.com/facebookresearch/balance/assets/55514836/062abe70-d576-4c27-a7d4-406df8087a32> image.4.png (view on web) <https://github.com/facebookresearch/balance/assets/55514836/5aeb9839-d033-495f-a1c4-a6647c96d031> image.5.png (view on web) <https://github.com/facebookresearch/balance/assets/55514836/ea7096ab-95fd-4b1d-8dfb-84c2e042cb91> df.info: image.9.png (view on web) <https://github.com/facebookresearch/balance/assets/55514836/b0063461-b472-4b75-b4ac-31ac8b75e3bc> — Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHOJBWPPETVDXACP5WN35TYW42QHAVCNFSM6AAAAABEGM4J7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRGE3DQNBZGY> . You are receiving this because you commented.Message ID: ***@***.***>

geniusjenny · 2024-03-06T16:51:54Z

Hi talgalili,
I just tried binning the numerical variables to categorical variables, but still the code returns the same error. While method='cbps' and method = 'ipw' run smoothly.

Here are the code and df.info:

ERROR:

talgalili · 2024-03-06T17:17:33Z

Thanks @geniusjenny
Interesting!
Could you please change the object type of the bucketed variables from 'categorical' to 'object'? And let me know if this resolve the error you get?

geniusjenny · 2024-03-06T17:36:22Z

I also tried that. Still getting the same error.

geniusjenny · 2024-03-06T17:45:45Z

I think I may find the issue.
Some of the bin that appears in the sample has never appeared in the target, causing this error.
Once I add the sample to the target, the bug disappear.
I suggest the code take this edge case in consideration as well!

t2=pd.concat([s2,t2])
t2.reset_index(inplace=True)
t2['id']=t2.index.astype('str')

talgalili · 2024-03-06T17:54:56Z

Great catch - thanks a bunch @geniusjenny !

O.k., I'll leave this issue open - and we'll get to add a proper exception in the future.

Thanks again.

geniusjenny · 2024-03-08T19:04:37Z

Thank you!

EmanueleCeglia · 2024-05-10T14:28:28Z

I jump in the issue because I have the same problem.
In my case I have no missing data in the target. I am trying to use the marginal distribution with rake.
If there is no weight column in the "sample" dataframe and target_df_from_marginals'' then is automatically created with values equal to 1.
Then, I tried to create the column "weight" for both: "target_df_from_marginals'' and the dataframe used to create "sample" but instead of use 1 used 1.0 so dtype - float and this time the error message is:
AttributeError: 'numpy.float64' object has no attribute 'loc'
Do you have any suggestions? @talgalili

talgalili · 2024-05-10T14:52:17Z

Hey @EmanueleCeglia ,
Do you want to share the code you used?
My guess is that you need to add the weight column to the DataFrame of your data before using Sample.from_frame so it will inherit from pandas the relevant methods.

EmanueleCeglia · 2024-05-10T15:22:31Z

df_sorted is a dataframe with two columns: ctrysize and ctrysect (they are sorted in alphabetical order) this is my df in which I have to calibrate weights.
ctrysize is the combination of 12 EU countries and for each country the dimension of the firm size (from 1 to 4)
ctrysect is the combination of 12 EU countries and for each country the sector of the firm (from A to D).

For each of these combinations I have the real totals in EU and I want to use these data as margins for the calibration.
In the picture below you can see how I used the totals to create the dictionaire "a_dict_with_marginal_distributions"

then

Error

Hope it's clear enough.
In any case I can provide additional details.
Thanks @talgalili

talgalili · 2024-05-10T17:41:40Z

Hi @EmanueleCeglia

could you please open a new issue for this discussion? (this seems like a separate issue)
If you run this tutorial, does it work? https://import-balance.org/docs/tutorials/quickstart_rake/
Notice that you have a huge amount of tiny buckets, regardless of this bug, are you sure you have values for each of them in your sample?

(please let's continue this discussion in the new bug you'll open - thanks)

EmanueleCeglia · 2024-05-10T21:16:05Z

Hi @talgalili yes the tutorial works perfectly

I am going to open a new issue so we can continue there

geniusjenny added the bug Something isn't working label Mar 5, 2024

geniusjenny changed the title ~~[BUG]~~ [BUG] method = 'rake' return AttributeError Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] method = 'rake' return AttributeError #73

[BUG] method = 'rake' return AttributeError #73

geniusjenny commented Mar 5, 2024 •

edited

Loading

talgalili commented Mar 5, 2024

geniusjenny commented Mar 5, 2024

talgalili commented Mar 5, 2024

geniusjenny commented Mar 5, 2024 •

edited

Loading

talgalili commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

talgalili commented Mar 6, 2024 via email

geniusjenny commented Mar 6, 2024 •

edited

Loading

talgalili commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

talgalili commented Mar 6, 2024

geniusjenny commented Mar 8, 2024

EmanueleCeglia commented May 10, 2024 •

edited

Loading

talgalili commented May 10, 2024

EmanueleCeglia commented May 10, 2024

talgalili commented May 10, 2024

EmanueleCeglia commented May 10, 2024

[BUG] method = 'rake' return AttributeError #73

[BUG] method = 'rake' return AttributeError #73

Comments

geniusjenny commented Mar 5, 2024 • edited Loading

Describe the bug

Session information

Screenshots

Reproducible example

Additional context

talgalili commented Mar 5, 2024

geniusjenny commented Mar 5, 2024

talgalili commented Mar 5, 2024

geniusjenny commented Mar 5, 2024 • edited Loading

talgalili commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

talgalili commented Mar 6, 2024 via email

geniusjenny commented Mar 6, 2024 • edited Loading

talgalili commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

geniusjenny commented Mar 6, 2024

talgalili commented Mar 6, 2024

geniusjenny commented Mar 8, 2024

EmanueleCeglia commented May 10, 2024 • edited Loading

talgalili commented May 10, 2024

EmanueleCeglia commented May 10, 2024

talgalili commented May 10, 2024

EmanueleCeglia commented May 10, 2024

geniusjenny commented Mar 5, 2024 •

edited

Loading

geniusjenny commented Mar 5, 2024 •

edited

Loading

geniusjenny commented Mar 6, 2024 •

edited

Loading

EmanueleCeglia commented May 10, 2024 •

edited

Loading