CuBLAS gives the same output every time #265

runarheggset · 2023-10-16T20:31:54Z

Running the following code with CuBLAS returns the same output every time it's run. Running without CuBLAS returns a different generation, as expected.

package main

import (
	"log"

	"github.com/go-skynet/go-llama.cpp"
)

func main() {
	model := "../models/airoboros-l2-13b-3.1.Q4_K_M.gguf"

	l, err := llama.New(model)
	if err != nil {
		panic(err)
	}
	defer l.Free()

	opts := []llama.PredictOption{
		llama.SetTokens(500),
		llama.SetThreads(20),
		llama.SetTopK(20),
		llama.SetTopP(0.9),
		llama.SetTemperature(0.7),
		llama.SetPenalty(1.15),
	}

	prompt := "Hello"

	text, err := l.Predict(prompt, opts...)
	if err != nil {
		panic(err)
	}

	log.Print(text)
}

Output with CuBLAS:
, I'm interested in 10000 W 127th St, Palos Park, IL 60465. Please send me more information about this property.

The text was updated successfully, but these errors were encountered:

retme7 · 2023-11-29T12:52:14Z

I have the save issue. llama.cpp works fine but gollama.cpp with CuBLAS did not.

deep-pipeline · 2023-12-13T08:08:53Z

Same generation output every time (presumably despite different prompts - you aren't absolutely clear on that) suggests that both the prompt being given via go-llama is being dropped and replaced with some placeholder and probably the temperature for running the prompt is being set to zero..

On the positive side the fact that any response is coming out of it suggests that the execution path is hooked up..

..you just need to find where (and why) in the code execution path specifically for cuBLAS there is a static prompt template, with temp zero. The template prompt will relate to the repeated output you keep seeing.

Sorry, I'm not involved in maintaining project - I was just reading through the issue backlog to get a feel for where the project is at and thought you might find the observation helpful. I did notice that for Metal execution there was an issue which caused a problem (but got addressed) which involved pulling over some ggml-metal file - if I were you I'd first make sure that all code I have locally absolutely matches the latest go-llama code base, then I would have a quick look around for the cuBLAS equivalent in the current go-llama code base and see if there is anything with temp=0 or a template prompt; after that I'd see if I could work out where in the go-llama execution path things fork depending on whether cuBLAS is used or not and I'd try to follow the cuBLAS path to the point where things get handed over to llama.cpp code.. the problem will be somewhere in there! Good luck - and remember, ChatGPT or Claude are your code explorer friends..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuBLAS gives the same output every time #265

CuBLAS gives the same output every time #265

runarheggset commented Oct 16, 2023

retme7 commented Nov 29, 2023

deep-pipeline commented Dec 13, 2023

CuBLAS gives the same output every time #265

CuBLAS gives the same output every time #265

Comments

runarheggset commented Oct 16, 2023

retme7 commented Nov 29, 2023

deep-pipeline commented Dec 13, 2023