Check out HubSpot’s Free ChatGPT Bundle!
In this video, I will be covering the latest and the hottest paper called Differential Transformer. Will also be covering some basics about self-attention, grouped query attention, and multi-head latent attention.
check out my newsletter:
Attention Is All You Need
[Paper]
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
[Paper]
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[Paper]
Differential Transformer
[Paper]
Flash Attention
[Paper]
This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, Ben Shaener, Chris LeDoux, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Marcelo Ferreira, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz, jiye, Anushka
[Discord]
[Twitter]
[Patreon]
[Music] massobeats – glimmer
[Profile & Banner Art]
[Video Editor] @Askejm







