Researchers Are Getting Really Creative Training LLMs [Token Order Prediction]



Deploy on Sevalla now and get a free $50 credit!

Meta’s 2024 paper explores Multi-Token Prediction (MTP), where LLMs predict several future tokens at once to improve efficiency and its “foresight”. With DeepSeek 2024 then proposing it as an “auxiliary objective”, a cool concept to improve model performance during training. But MTP may not be the ideal AO yet. In this video, I’ll be introducing Token Order Prediction (TOP), a softer, auxiliary objective that enhances model performance by learning the relative order of upcoming tokens.

My Newsletter

my project: find, discover & explain AI research semantically

My Patreon

Multi-token Prediction
[Paper]

Token Order Prediction
[Paper]

DeepSeek V3
[Paper]

Try out my new fav place to learn how to code

This video is supported by the kind Patrons & YouTube Members:
🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N’ Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon

[Discord]
[Twitter]
[Patreon]
[Business Inquiries] bycloud@smoothmedia.co
[Profile & Banner Art]
[Video Editor] @Booga04
[Ko-fi]

Leave a Reply