8
Google's TurboQuant compresses AI memory by 6x, rattles chip stocks
(thenextweb.com)
A community for high quality news and discussion around technological advancements and changes
Things that fit:
Things that don't fit
Uh, pretty much no one uses 16 bit KV cache, so it's extremely dubious that this specific quant technique is relevant to memory stocks at all...
We already have q4 and q8 KV cache quantization. LLM performance is highly sensitive to KV cache quantization though so q4 is probably only reasonable to use with specific models that don't suck as badly when using it, and that's likely the same for this new quantization technique.
Extreme doubt.
Again, no one is using 32 bit values for KV cache. It is like saying how fast the latest car is by comparing it to a horse and buggy.