why is it so slower than origin llava1.5 model?

when i use the origin llava1.5-13b model,it infer  one picture only cost 200ms,but now when use the this tool,it cost 900ms， but i found the total infer time is 15.84ms in llama_print_timings， so which part does cost so much time 900ms-15ms?
llama_print_timings:        load time =    1589.62 ms
llama_print_timings:      sample time =       0.18 ms /     1 runs   (    0.18 ms per token,  5714.29 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =      12.74 ms /     1 runs   (   12.74 ms per token,    78.48 tokens per second)
llama_print_timings:       total time =      15.84 ms /     2 tokens
{'id': 'chatcmpl-e809df61-adf5-4968-aa61-6ff03ed37472', 'object': 'chat.completion', 'created': 1711705631, 'model': 'ggml_llava-v1.5-13b/ggml-model-q4_k.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': ' D'}, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 744, 'completion_tokens': 1, 'total_tokens': 745}}
D
infer llava predict time is:0.9298880100250244
vqa_time: 0.9299123287200928


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why is it so slower than origin llava1.5 model? #1313

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

why is it so slower than origin llava1.5 model? #1313

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions