Not possible to create a post request with more than 268 strings for create_embeddings()? #48

atantos · 2023-08-18T13:19:57Z

Hi there.

It seems there is a time or string limit for doing post requests? Although I am able to do a one time request for 1000 strings out of the overview column in R, I cannot do more than 268 right now. Is it an issue with the package or am I missing something?

Thanks!

using CSV, DataFrames, OpenAI
horror_movies = CSV.read(Downloads.download("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv"), DataFrame);

r = create_embeddings(
        ENV["OPENAI_API_KEY"],
        horror_movies.overview[1:268],
        "text-embedding-ada-002"
    )

Here is the error message I get:

{
  "error": {
    "message": "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Although horror_movies.overview is a string vector..

UPDATE I: I try different vector sizes and it seems there is no hard upper bound for the string vector size. I just managed to get 700 string of horror_movies.overview with horror_movies.overview[1:700]. Is there something that we as users should know or is it simply random luck related to the traffic limits that their server puts?
UPDATE II: However, in R with the following code written by Julia Silge it works every single time for all the 1000 overview texts:

library(tidyverse)

set.seed(123)
horror_movies <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-11-01/horror_movies.csv') %>%
  filter(!is.na(overview), original_language == "en") %>%
  slice_sample(n = 1000)

library(httr)
embeddings_url <- "https://api.openai.com/v1/embeddings"
auth <- add_headers(Authorization = paste("Bearer", "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"))
body <- list(model = "text-embedding-ada-002", input = horror_movies$overview)

resp <- POST(
  embeddings_url,
  auth,
  body = body,
  encode = "json"
)

embeddings <- content(resp, as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("data", "embedding")

The text was updated successfully, but these errors were encountered:

algunion · 2023-08-19T23:49:47Z

Please check out the answer here.

cpfiffer · 2023-12-01T23:37:23Z

Closing as this is resolved in the discourse post linked above.

atantos closed this as completed Aug 18, 2023

atantos reopened this Aug 18, 2023

cpfiffer closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

atantos commented Aug 18, 2023 •

edited

Loading

algunion commented Aug 19, 2023 •

edited

Loading

cpfiffer commented Dec 1, 2023

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

Not possible to create a post request with more than 268 strings for create_embeddings()? #48

Comments

atantos commented Aug 18, 2023 • edited Loading

algunion commented Aug 19, 2023 • edited Loading

cpfiffer commented Dec 1, 2023

atantos commented Aug 18, 2023 •

edited

Loading

algunion commented Aug 19, 2023 •

edited

Loading