"Hi guys! I need some help. I'm running an old 4-core Athlon with an RTX 3060 12GB. Standard LLMs wouldn't start on my hardware, so I had Google's AI help me build a custom version of llama.cpp. It works perfectly now. I also managed to run Acestep 1.5 on my machine.
Currently, I've quantized your 4B model into q8.gguf (about 4GB), and it's running smoothly on my custom llama build. Could you please help me 'bridge' my llama setup with Acestep 1.5? I want to use the 4B model as the 'brain' and the Turbo musician for generation. Thanks in advance!"