aitech
VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO
A new 3B parameter AI model, VibeThinker, has reportedly surpassed Opus 4.5 in reasoning capabilities. This model employs novel SFT+GRPO techniques. The development, presented from a practitioner perspective, gained attention on the front page of Hacker News, with its research paper available on arXiv.
Sources 1 independent outlet
1A