11

Test-time training (TTT) significantly enhances language models' abstract reasoning, improving accuracy up to 6x on the Abstraction and Reasoning Corpus (ARC). Key factors for successful TTT include initial fine-tuning, auxiliary tasks, and per-instance training. Applying TTT to an 8B-parameter model boosts accuracy to 53% on ARC's public validation set, nearly 25% better than previous public, neural approaches. Ensemble with recent program generation methods achieves 61.9% accuracy, matching average human scores. This suggests that, in addition to explicit symbolic search, test-time training on few-shot examples significantly improves abstract reasoning in neural language models.

you are viewing a single comment's thread
view the rest of the comments
[-] AtmosphericRiversCuomo@hexbear.net 3 points 20 hours ago

The training datasets don't have the answers because the benchmark is diverse enough. That's why other models struggled to perform as well as humans until they applied the approach outlined in the paper. This is the benchmark: https://liusida.github.io/ARC/

this post was submitted on 15 Nov 2024
11 points (92.3% liked)

technology

23307 readers
309 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS