Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

Closed
atantos opened this issue Aug 18, 2023 · 2 comments

Comments

@atantos
Copy link

atantos commented Aug 18, 2023

Hi there.

It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now. Is it an issue with the package or am I missing something?

Thanks!

using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

r = create_embeddings(
        ENV["OPENAI_API_KEY"],
        horror_movies.overview[1:268],
        "text-embedding-ada-002"
    )

Here is the error message I get:

{
  "error": {
    "message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Although horror_movies.overview is a string vector..

UPDATE I: I try different vector sizes and it seems there is no hard upper bound for the string vector size. I just managed to get 700 string of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?
UPDATE II: However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:

library(tidyverse)

set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
  filter(!is.na(overview), original_language == "en") %>%
  slice_sample(n = 1000)

library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)

resp <- POST(
  embeddings_url,
  auth,
  body = body,
  encode = "json"
)

embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("data", "embedding")
@atantos atantos closed this as completed Aug 18, 2023
@atantos atantos reopened this Aug 18, 2023
@algunion
Copy link
Contributor

algunion commented Aug 19, 2023

Please check out the answer here.

@cpfiffer
Copy link
Collaborator

cpfiffer commented Dec 1, 2023

Closing as this is resolved in the discourse post linked above.

@cpfiffer cpfiffer closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants