Meta's Llama 4 is a beast (includes 10 million token context)

You can learn more about the Llama 4 release here:
Try it yourself here:

Meta recently released its most powerful open-source AI models yet: Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth (currently in preview).

Llama 4 Scout is a 17B parameter model with a 10 million token context window, outperforming models like Gemma 3 and Gemini 2.0 Flash-Lite—all while running on a single NVIDIA H100 GPU.

Llama 4 Maverick, also with 17B active parameters but using 128 experts, beats GPT-4o and Gemini 2.0 Flash across key benchmarks. It delivers state-of-the-art performance in reasoning, coding, and vision, while being more efficient than larger competing models.

Both Scout and Maverick were trained using Meta’s 288B parameter teacher model, Llama 4 Behemoth, which already outperforms GPT-4.5 and Claude Sonnet 3.7 on several STEM benchmarks.

These models use a “mixture of experts” architecture—only the necessary parts of the model are activated for each task, making them faster and more efficient.

You can try Scout and Maverick directly on Meta’s site or download them from Hugging Face if you’re a developer.