Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hugging face client #42

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

philschmid
Copy link

What does this PR do?

This PR adds a dedicated Hugging Face client, which allows llmperf user to benchmark Hugging Face models using TGI on the API inference, Inference Endpoints or Locally/any URL.

Below is an simple example

run tgi

docker run --gpus all -ti -p 8080:80   -e MODEL_ID=HuggingFaceH4/zephyr-7b-beta ghcr.io/huggingface/text-generation-inference:latest

run benchmark

export HUGGINGFACE_API_BASE="http://localhost:8080"
export MODEL_ID="HuggingFaceH4/zephyr-7b-beta"

python token_benchmark_ray.py \
--model $MODEL_ID \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 2 \
--timeout 600 \
--num-concurrent-requests 1 \
--results-dir "result_outputs" \
--llm-api huggingface \
--additional-sampling-params '{}'

@philschmid
Copy link
Author

cc @waleedkadous

@slyt
Copy link

slyt commented Jul 8, 2024

@philschmid The README mentions HUGGINGFACE_API_KEY, but I couldn't get the your fork to benchmark Llama3 on an instance of text-generation-inference server without specifying HUGGINGFACE_API_TOKEN. Is there a difference between HUGGINGFACE_API_TOKEN and HUGGINGFACE_API_KEY? Should all references be one or the other?

If HUGGINGFACE_API_TOKEN is not set, you get this error when trying to benchmark meta-llama/Meta-Llama-3-70B-Instruct. It can't pull the tokenizer without the token because Llama3 tokenizer is behind an agreement acknowledgment page:

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct.
401 Client Error. (Request ID: Root=1-668c4b2e-082a7cbe6986c4514589204c;528c624d-4cfa-42f0-bd0f-d3f2e1431fbf)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-70B-Instruct is restricted. You must be authenticated to access it.
  0%|                                                               | 0/2 [00:06<?, ?it/s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants