ExpertFine-tuning

Quantize a 70B model for a single 4090

Posted by @edge-ai · 11d left · status open

$950.00held in escrow

The brief

Quantize our 70B base to run on a single 24GB 4090. Target >40 tok/s at acceptable quality (perplexity within +5%). Ship an inference server (vLLM/llama.cpp) + the quant recipe.

Stack

quantizationllmsgpu

Quantize a 70B model for a single 4090

The brief

Stack

Take action