Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuBLAS gives the same output every time #265

Open
runarheggset opened this issue Oct 16, 2023 · 2 comments
Open

CuBLAS gives the same output every time #265

runarheggset opened this issue Oct 16, 2023 · 2 comments

Comments

@runarheggset
Copy link

Running the following code with CuBLAS returns the same output every time it's run. Running without CuBLAS returns a different generation, as expected.

package main

import (
	"log"

	"github.com/go-skynet/go-llama.cpp"
)

func main() {
	model := "../models/airoboros-l2-13b-3.1.Q4_K_M.gguf"

	l, err := llama.New(model)
	if err != nil {
		panic(err)
	}
	defer l.Free()

	opts := []llama.PredictOption{
		llama.SetTokens(500),
		llama.SetThreads(20),
		llama.SetTopK(20),
		llama.SetTopP(0.9),
		llama.SetTemperature(0.7),
		llama.SetPenalty(1.15),
	}

	prompt := "Hello"

	text, err := l.Predict(prompt, opts...)
	if err != nil {
		panic(err)
	}

	log.Print(text)
}

Output with CuBLAS:
, I'm interested in 10000 W 127th St, Palos Park, IL 60465. Please send me more information about this property.

@retme7
Copy link

retme7 commented Nov 29, 2023

I have the save issue. llama.cpp works fine but gollama.cpp with CuBLAS did not.

@deep-pipeline
Copy link

Same generation output every time (presumably despite different prompts - you aren't absolutely clear on that) suggests that both the prompt being given via go-llama is being dropped and replaced with some placeholder and probably the temperature for running the prompt is being set to zero..

On the positive side the fact that any response is coming out of it suggests that the execution path is hooked up..

..you just need to find where (and why) in the code execution path specifically for cuBLAS there is a static prompt template, with temp zero. The template prompt will relate to the repeated output you keep seeing.

Sorry, I'm not involved in maintaining project - I was just reading through the issue backlog to get a feel for where the project is at and thought you might find the observation helpful. I did notice that for Metal execution there was an issue which caused a problem (but got addressed) which involved pulling over some ggml-metal file - if I were you I'd first make sure that all code I have locally absolutely matches the latest go-llama code base, then I would have a quick look around for the cuBLAS equivalent in the current go-llama code base and see if there is anything with temp=0 or a template prompt; after that I'd see if I could work out where in the go-llama execution path things fork depending on whether cuBLAS is used or not and I'd try to follow the cuBLAS path to the point where things get handed over to llama.cpp code.. the problem will be somewhere in there! Good luck - and remember, ChatGPT or Claude are your code explorer friends..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants