-
Notifications
You must be signed in to change notification settings - Fork 60
[BUG] bind_tweets(): 'Column id
doesn't exist.' with empty data_.json
#304
Comments
@TimBMK Thanks for reporting the bug. I can reproduce this. require(academictwitteR)
#> Loading required package: academictwitteR
users <- c("303234771", "2821282972", "84803032", "154096311", "2615232002", "37776042", "2282315483", "405599246", "1060861584938057728", "85161049")
tempdir <- academictwitteR:::.gen_random_dir()
get_user_timeline(x = users,
start_tweets = "2017-04-01T00:00:00Z",
end_tweets = "2017-06-01T00:00:00Z",
n = 3200,
data_path = tempdir,
bind_tweets = FALSE,
verbose = FALSE)
#> data frame with 0 columns and 0 rows
list.files(tempdir)
#> [1] "data_.json" "data_848204306566320128.json"
#> [3] "data_848950153520218113.json" "query"
#> [5] "users_.json" "users_848204306566320128.json"
#> [7] "users_848950153520218113.json"
data <- bind_tweets(data_path = tempdir, output_format = "tidy")
#> Error in `stop_subscript()`:
#> ! Can't rename columns that don't exist.
#> ✖ Column `id` doesn't exist.
data_raw <- bind_tweets(data_path = tempdir, output_format = "raw") Created on 2022-03-10 by the reprex package (v2.0.1) There are actually two issues here:
|
@TimBMK I will keep this issue focusing on only the second issue. And I will open another issue related to the first one. |
Hello, This worked for me: but when trying to convert to csv with: I get this error: |
@psalmuel19 this is unrelated to the issue mentioned above, as it is clearly caused by write.csv() rather than the bind_tweets() function. I suspect the nested lists in the raw data format to cause problems. Try unnesting batch_four or use output_format = "tidy" when binding the tweets. If the issue persists, please open a seperate issue. |
@TimBMK While searching for a solution, I came across the output_format = "raw" code. It worked in binding but I now can't convert to csv. Any suggestions please? |
As mentioned in the original post, the easiest fix to get the tidy format to work is to go into the folder with the data and manually delete the empty "data_.json" files. This fixes the issue with the tidy format, as the issue with the non-existent id column does not come up. The raw format does not output a dataframe, but a list of tibbles (a type of dataframe) of different length containing different information (this is what the API returns originally). If you are set on using the raw format, you will have to decide what information you want to export to .csv. If you look at the structure of the raw data object (batch_four in your case), it is relatively self-explanatory what you get in each of the tibbles. An easy way to do this yourself is with |
Please confirm the following
something went wrong. Status code: 400.
Describe the bug
As soon as there is a .json file in the datapath of bind_tweets without an ID ("data_.json"), the function fails with an error if set to the "tidy" format. Generating the "raw" format, however, is not an issue. The following error occures:
The data_.json is usually an empty file, but it seems to get generated whenever native academictwitteR functions do not return any twitter data (empty pages). The last three times I used get_user_timeline(), I ended up with these empty files. Deleting the data_.json file fixes the error. Furthermore, I believe the problem only started occuring after I updated academictwitteR to 0.3.1. I don't think it occured under 0.2.1.
Expected Behavior
I would suggest some sort of failsafe that automatically skips .json files without the ID, as they seem to be empty anyways.
Steps To Reproduce
Environment
Anything else?
Possibly related to #218
The text was updated successfully, but these errors were encountered: