Skip to content

Releases: sammcj/gollama

Release v1.26.0

03 Aug 02:27
c4266a2
Compare
Choose a tag to compare

1.26.0 (2024-08-03)

Features

What's Changed

Full Changelog: v1.24.0...v1.26.0

Release v1.24.0

03 Aug 00:43
2afded8
Compare
Choose a tag to compare

1.24.0 (2024-08-03)

Features

What's Changed

Full Changelog: v1.22.0...v1.24.0

Release v1.22.0

02 Aug 00:19
Compare
Choose a tag to compare

1.22.0 (2024-08-01)

Features

Full Changelog: v1.21.1...v1.22.0

Release v1.21.1

01 Aug 22:58
Compare
Choose a tag to compare

New feature: vRAM estimator!

  • Calculate vRAM usage for a given model configuration
  • Determine maximum context length for a given vRAM constraint
  • Find the best quantisation setting for a given vRAM and context constraint
  • Support for different k/v cache quantisation options (fp16, q8_0, q4_0)

To estimate VRAM usage:

gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant q4_k_m --context 2048 --kvcache q4_0 # For GGUF models
gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant 5.0 --context 2048 --kvcache q4_0 # For exl2 models
# Estimated VRAM usage: 5.35 GB

To calculate maximum context for a given memory constraint:

gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --quant q4_k_m --memory 6 --kvcache q8_0 # For GGUF models
gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --bpw 5.0 --memory 6 --kvcache q8_0 # For exl2 models
# Maximum context for 6.00 GB of memory: 5069

To find the best BPW:

gollama --vram --model NousResearch/Hermes-2-Theta-Llama-3-8B --memory 6 --quanttype gguf
# Best BPW for 6.00 GB of memory: IQ3_S

The vRAM estimator works by:

  1. Fetching the model configuration from Hugging Face (if not cached locally)
  2. Calculating the memory requirements for model parameters, activations, and KV cache
  3. Adjusting calculations based on the specified quantisation settings
  4. Performing binary and linear searches to optimize for context length or quantisation settings

1.21.1 (2024-08-01)

What's Changed

Full Changelog: v1.20.4...v1.21.1

Release v1.20.4

21 Jul 21:40
85d01a4
Compare
Choose a tag to compare

1.20.4 (2024-07-21)

Documentation

  • contributor: contributors readme action update (#80) (7a3356b)

What's Changed

  • chore(renovate): pin Update actions/setup-go digest to 0a12ed9 by @renovate in #71
  • fix: index out of range error in quantColour function by @anrgct in #79
  • docs(contributor): contributors readme action update by @github-actions in #80
  • chore(deps): bump deps by @sammcj in #81

New Contributors

Full Changelog: v1.20.2...v1.20.4

Release v1.20.2

14 Jul 07:28
74a3bef
Compare
Choose a tag to compare

1.20.2 (2024-07-14)

Bug Fixes

  • tagging: hopefully fix tagging in actions vs makefile (#74) (74a3bef)

What's Changed

  • fix(tagging): hopefully fix tagging in actions vs makefile by @sammcj in #74

Full Changelog: 1.20.1...v1.20.2

Release 1.20.1

14 Jul 07:13
f3f5a4f
Compare
Choose a tag to compare

What's Changed

  • chore(renovate): pin Update actions/upload-artifact digest to 0b2256b by @renovate in #70
  • feat: pull model updates by @sammcj in #69
  • feat: pull existing or new model by @sammcj in #72
  • chore(deps): bump deps by @sammcj in #73

Full Changelog: 1.18.2...1.20.1

Release 1.18.2

05 Jul 07:14
Compare
Choose a tag to compare

What's Changed

Full Changelog: 1.16.0...1.18.2

Release 1.17.0

04 Jul 03:32
174673d
Compare
Choose a tag to compare

1.17.0 (2024-07-04)

Features

BREAKING

  • Update model (u) has been replaced by Edit model (e)

What's Changed

  • feat: add edit model cli and tui by @sammcj in #64

Full Changelog: 1.16.0...1.17.0

Release 1.16.0

03 Jul 22:11
234511f
Compare
Choose a tag to compare

1.16.0 (2024-07-03)

Features

What's Changed

Full Changelog: 1.15.0...1.16.0