Skip to content

why is it so slower than origin llava1.5 model? #1313

@ganliqiang

Description

@ganliqiang

when i use the origin llava1.5-13b model,it infer one picture only cost 200ms,but now when use the this tool,it cost 900ms, but i found the total infer time is 15.84ms in llama_print_timings, so which part does cost so much time 900ms-15ms?
llama_print_timings: load time = 1589.62 ms
llama_print_timings: sample time = 0.18 ms / 1 runs ( 0.18 ms per token, 5714.29 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 1 tokens ( 0.00 ms per token, inf tokens per second)
llama_print_timings: eval time = 12.74 ms / 1 runs ( 12.74 ms per token, 78.48 tokens per second)
llama_print_timings: total time = 15.84 ms / 2 tokens
{'id': 'chatcmpl-e809df61-adf5-4968-aa61-6ff03ed37472', 'object': 'chat.completion', 'created': 1711705631, 'model': 'ggml_llava-v1.5-13b/ggml-model-q4_k.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': ' D'}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 744, 'completion_tokens': 1, 'total_tokens': 745}}
D
infer llava predict time is:0.9298880100250244
vqa_time: 0.9299123287200928

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions