Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Pinned Loading
Repositories
Showing 10 of 70 repositories
- gateway-api-inference-extension Public Forked from kubernetes-sigs/gateway-api-inference-extension
Gateway API Inference Extension
neuralmagic/gateway-api-inference-extension’s past year of commit activity - compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/compressed-tensors’s past year of commit activity - model-validation-configs Public
neuralmagic/model-validation-configs’s past year of commit activity - vllm-fork Public Forked from tlrmchlsmth/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/vllm-fork’s past year of commit activity - lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/lm-evaluation-harness’s past year of commit activity