A $196 fine-tuned 7B model outperforms OpenAI o3 on document extraction (arxiv.org)

submitted 3 days ago by rssbot@lemmy.bestiver.se to c/hackernews@lemmy.bestiver.se

3 comments fedilink hide all child comments

top 3 comments

sorted by: hot top controversial new old

[-] mindbleach@sh.itjust.works 1 points 2 days ago

The supervised fine-tuning phase employed Low-Rank Adaptation (LoRA) to efficiently adapt the base DeepSeek- R1-Distill-Qwen-7B model for extraction tasks

So this is bolted on top of a model that cost six figures.

[-] Dionysus@leminal.space 1 points 2 days ago

And deepseek is based on llama, more than six figures.

I'm not aware of any larger parameter LLMs not based on one which is absurdly expensive.

[-] mindbleach@sh.itjust.works 1 points 2 days ago

DeepSeek is trained from-scratch. Only some variants used other LLMs.

This is a megaphone made from string, a squirrel, and a megaphone.

this post was submitted on 30 Sep 2025

10 points (91.7% liked)

Hacker News

2713 readers

421 users here now

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

founded 1 year ago

MODERATORS

patrick@lemmy.bestiver.se

rssbot@lemmy.bestiver.se