Seems there's not a lot of talk about relatively unknown finetunes these days, so I'll start posting more!
Openbuddy's been on my radar, but this one is very interesting: QwQ 32B, post-trained on openbuddy's dataset, apparently with QAT applied (though it's kinda unclear) and context-extended. Observations:
-
Quantized with exllamav2, it seems to show lower distortion levels than nomal QwQ. Its works conspicuously well at 4.0bpw and 3.5bpw.
-
Seems good at long context. Have not tested 200K, but it's quite excellent in the 64K range.
-
Works fine in English.
-
The chat template is funky. It seems to mix up the and <|think|> tags in particular (why don't they just use ChatML?), and needs some wrangling with your own template.
-
Seems smart, can't say if it's better or worse than QwQ yet, other than it doesn't seem to "suffer" below 3.75bpw like QwQ does.
Also, I reposted this from /r/locallama, as I feel the community generally should going forward. With its spirit, it seems like we should be on Lemmy instead?
Tinygrad is (so far) software only, ostensibly sort of a lightweight PyTorch replacement.
Tinygrad is (so far) not really used for much, not even research or tinkering.
Between that and the lead dev's YouTube antics, it kinda seems like hot air to me.