Skip to content

Dflash issue on V100 build #259

@merbanan

Description

@merbanan
uv run --directory dflash python scripts/run.py --prompt "def fibonacci(n):"
[run] prompt 14 tokens, streaming up to 256 tokens, max_ctx=512
[cfg] seq_verify=0 fast_rollback=1 ddtree=1 budget=22 temp=1.00 chain_seed=1 fa_window=2048 draft_swa=0 draft_ctx_max=4096 draft_feature_mirror=0 peer_access=0 target_gpu=0 draft_gpu=0
[loader] eos_id=248046 eos_chat_id=-1
[target] target loaded: layers [0,64) output=1, 850 tensors on GPU 14.99 GiB, tok_embd 682 MiB CPU-only (q4_K)
[draft]  loaded
[prompt] 14 tokens
[prefill] token-seg ubatch=256
[prefill] 14 tokens in 0.10 s, last_tok=0
[migrate] 242.89 ms
[dbg sib step 0] N=23 accept=1 walked_sib=0
  walk: 0
[step 0] committed=14 last_tok=0 tree_N=23 accept=1 next=-1

[timing] per-step averages over 0 steps (ms):
  draft_build    0.27
  draft_copyfeat 0.07
  draft_set      0.02
  draft_compute  10.16
  draft_bridge   0.01
  draft_logits   9.47
  snapshot_ssm   0.00
  verify_build   1.68
  verify_set     0.25
  verify_compute 93.74
  verify_logits  0.00
  accept         0.04
  restore_ssm    0.00
  replay_build   0.00
  replay_set     0.00
  replay_compute 0.00
  replay_logits  0.00
  mirror_sync    0.00
  ----- sum     115.71

[dflash] generated 0 tokens in 0.116 s  ->  0.00 tok/s
[dflash] 0 draft steps, accepted=0/0 (0.0% per step), avg commit/step=0.00
[dflash] output tail: 248045 846 198 727 73111 1393 1590 248046 198 248045 74455 198 248068 198 0 
!ggml_cuda_init: found 1 CUDA devices (Total VRAM: 32501 MiB):
  Device 0: Tesla V100-PCIE-32GB, compute capability 7.0, VMM: yes, VRAM: 32501 MiB

[run] generated 1 tokens


uv run --directory dflash python scripts/bench_llm.py
[bench] target    = /media/per/work/tmp/lucebox-hub/dflash/models/Qwen3.6-27B-Q4_K_M.gguf
[bench] draft     = /media/per/work/tmp/lucebox-hub/dflash/models/draft/dflash-draft-3.6-q8_0.gguf
[bench] ar bin    = /media/per/work/tmp/lucebox-hub/dflash/build/test_generate
[bench] df bin    = /media/per/work/tmp/lucebox-hub/dflash/build/test_dflash
[bench] tokenizer = Qwen/Qwen3.5-27B
[bench] budget    = 22

[bench] ==== HumanEval (n=10, n_gen=256) ====
  [01/10] n_tok=  92  AR= 30.00  DFlash=   0.00  AL= 0.00

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingv100

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions