| domain | llama.com |
| summary | This document compares the performance of Llama 4 Maverick, Llama 4 Scout, and Llama 4 Behemoth models against Gemini 2.0 Flash and DeepSeek v3.1 across various benchmarks. Key metrics include inference cost, cost per 1M input tokens (3:1 blended output), and output token counts. The models are evaluated on benchmarks such as MMLU Pro, GPQA Diamond, MathVista, Image Reasoning (MMMU), ChartQA, DocVQA, LiveCodeBench, and Long Context tasks. Llama 4 models have a context window of 128K, while Gemini 2.0 Flash and DeepSeek v3.1 have a 128K context window. The data is based on 0-shot evaluations with a temperature of 0, without majority voting or parallel test time compute. |
| title | Industry Leading, Open-Source AI | Llama |
| description | Discover Llama 4's class-leading AI models, Scout and Maverick. Experience top performance, multimodality, low costs, and unparalleled efficiency. |
| keywords | llama, models, context, model, cost, image, maverick, build, more, reasoning, scout, results, program, leading, intelligence, efficiency, window |
| upstreams |
|
| downstreams |
|
| nslookup | A 31.13.83.8 |
| created | 2025-11-09 |
| updated | 2026-01-29 |
| summarized | 2026-01-30 |
|
|