Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sagemaker hugging face deployment issue: #38

Open
Al-aminI opened this issue Nov 19, 2023 · 1 comment
Open

Sagemaker hugging face deployment issue: #38

Al-aminI opened this issue Nov 19, 2023 · 1 comment

Comments

@Al-aminI
Copy link

hi, good afternoon, i deployed the deepseek-ai/deepseek-coder-7b-instruct model on sagemaker with the same config as your demo on hugging face like tok_p 0.9 and top_k 50, i assume the temprature is 0.6, if it is not please tell me the one you use, do_sample as false, it is running fine, but if i try a prompt on your demo, it gives correct and accurate result, but if i prompt the one i deployed it doesn't give me as accurate result with thesame prompt, please is there any tweak that you did there and you can share it with me, please i need your help. thanks. @chester please respond to this.
and please could it be that there is "deepseek-ai/deepseek-coder-7b-instruct" and also "deepseek-ai/deepseek-coder-7b-chat"
?

and please what is the stop token, because even if i use "stop":[<|EOT|>], it still keep generating until the max_new_token is exhausted.
here is how i am deploying to sagemaker:
`import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

Hub Model configuration. https://huggingface.co/models

hub = {
'HF_MODEL_ID':'deepseek-ai/deepseek-coder-6.7b-instruct',
'SM_NUM_GPUS': json.dumps(1)
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "My name is Julien and I like to",
"parameters": {
"do_sample": False,
"top_p": 0.90,
"top_k": 50,
"temperature": 0.35,
"max_new_tokens": 1024,
"repetition_penalty": 1.0,
"stop": ["<|EOT|>"]
}
})`

@existeundelta
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants