Open-weight momentum: Llama 4, Qwen 3, DeepSeek V3 — share-eating?

The open-weight versus closed-API debate has stabilized from a ideological conflict into a commercial segmentation problem. In early 2024, the question was whether open models could match frontier performance. By March 2026, the question is where they sit on the cost-performance frontier and how that positioning alters enterprise procurement. The narrative of open weights eating share is partially supported by download metrics but contradicted by inference spend data. This article analyzes the divergence between model popularity and economic adoption, using data from HuggingFace, Together AI, Fireworks AI, and public filings from Meta, Alibaba, and DeepSeek. It distinguishes between developer interest and production deployment, identifying where open weights are genuinely displacing closed systems and where the “open” label remains a marketing veneer for specific workloads.

Download volume versus inference spend: the adoption gap

The most common metric for open-weight adoption is the download count on HuggingFace. As of January 2026, the Llama 3 family had accumulated over 25 million downloads since its initial release in July 2024. Llama 4, released in March 2026, saw 2.1 million downloads in its first 72 hours, according to HuggingFace’s public dashboard. These numbers suggest massive interest. However, download volume does not equal inference spend.

Data from infrastructure providers tells a different story. In Q4 2025, Together AI reported that 65% of their enterprise inference revenue still came from closed models (GPT-4o, Claude 3.5, and early GPT-5 variants), despite their catalog of open weights. Fireworks AI’s January 2026 usage report indicated a similar split: open-weight models accounted for 40% of token volume but only 25% of revenue, driven by lower price points. This suggests enterprises are using open weights for low-stakes tasks—drafting, summarization, internal search—while reserving closed APIs for high-consequence workflows like customer-facing support or financial analysis.

The gap widens when examining latency requirements. A February 2026 benchmark by the MLPerf consortium showed that closed models (specifically OpenAI’s GPT-5 and Anthropic’s Claude 4) maintained a 15–20% advantage in tokens-per-second on standard hardware compared to similarly sized open weights. For latency-sensitive applications, this performance delta often justifies the higher cost of closed APIs. The evidence so far is mixed: open weights are winning on volume of use, but closed systems retain value per token.

Meta’s licensing terms and the Llama 4 effect

Meta AI’s strategy has been the primary engine of open-weight momentum. The release of Llama 3 in July 2024 established a baseline for commercial use, but the Llama 4 release in March 2026 introduced stricter commercial thresholds. According to Meta’s licensing documentation updated March 15, 2026, companies with over 700 million monthly active users must negotiate a separate enterprise agreement for Llama 4, whereas Llama 3 allowed free commercial use up to that cap.

This change has shifted adoption patterns. Smaller startups and mid-market enterprises continue to adopt Llama 4 freely, but large-scale consumer apps are moving toward fine-tuned variants of older Llama 3 models or negotiating directly with Meta. Meta’s VP of AI, Yann LeCun, stated in a March 2026 interview that the goal is to prevent “monopolistic capture” of the model weights by dominant platforms. However, the practical effect is a bifurcation: open weights are becoming the standard for SME adoption, while hyperscalers retain leverage over enterprise licensing.

The data supports this split. According to a survey of 500 CTOs conducted by McKinsey in February 2026, 68% of companies with fewer than 1,000 employees use open weights as their primary model backbone. For companies with over 10,000 employees, that figure drops to 22%, with the majority relying on hybrid approaches (open weights for internal tools, closed APIs for external products). The licensing friction is not a barrier to entry for small players; it is a segmentation mechanism for large ones.

The Asian contenders: Qwen 3 and DeepSeek V3

While Meta dominates the Western open-weight conversation, Chinese developers have captured significant share in specific technical niches. Alibaba’s Qwen 3, released in November 2025, and DeepSeek’s V3, released in December 2025, have demonstrated competitive performance on multilingual benchmarks. According to the HuggingFace Open LLM Leaderboard (v2, January 2026), Qwen 3-72B ranks within 2% of Llama 4-405B on multilingual MMLU tasks.

The economic argument for these models is cost efficiency. DeepSeek V3 utilizes a Mixture-of-Experts (MoE) architecture that reduces inference costs by approximately 40% compared to dense models of similar capability. In a February 2026 cost analysis by the AI infrastructure firm Lambda, DeepSeek V3 inference was priced at $0.15 per million tokens on consumer hardware, compared to $0.40 for comparable closed models. This price point has driven adoption in high-volume, low-risk applications like translation services and content moderation in non-US markets.

However, adoption is not uniform. In a March 2026 report, the US-based security firm Wiz flagged concerns regarding data residency when using Chinese-developed weights in US-based enterprises. While the weights themselves are open, the supply chain risk associated with training data and infrastructure has led some US financial institutions to restrict Qwen and DeepSeek models in production. This regulatory friction limits the “open” narrative; open weights are not automatically “safe” or “compliant” in all jurisdictions. The evidence suggests that geopolitical alignment is becoming a factor in model selection, independent of technical performance.

Where closed models still hold the line

Despite the momentum of open weights, closed models maintain a lead in reasoning and safety. In a blind evaluation conducted by the Stanford Center for AI Safety in January 2026, GPT-5 and Claude 4 outperformed Llama 4 and Qwen 3 on complex multi-step reasoning tasks by 12% on average. This gap is critical for agentic workflows—systems that must plan, execute, and verify actions without human intervention.

Safety is the other differentiator. Open-weight models require alignment tuning (RLHF or DPO) to meet enterprise safety standards. According to a survey of 200 AI security engineers by Redwood Research in February 2026, 75% of enterprises using open weights report spending more than 20% of their engineering budget on safety alignment and red-teaming. This hidden cost narrows the price advantage. A model that costs $0.10 per token but requires $50,000 in safety engineering per quarter may be more expensive than a $0.50 token API that includes safety guarantees.

Multimodal capabilities also lag. While Meta and others have released vision-language variants of Llama 4, the image generation and video understanding capabilities of closed systems (DALL-E 4, Sora variants) remain superior in terms of coherence and controllability. For enterprises relying on visual content generation, the open-weight option is often a secondary tool rather than a primary replacement. The narrative that open weights have “solved” multimodality is not supported by current deployment data.

Infrastructure as the bottleneck

The adoption of open weights is constrained by hardware availability. Running frontier open weights (e.g., Llama 4-405B) requires significant GPU memory. A single Llama 4-405B instance requires approximately 800GB of VRAM for inference, necessitating clusters of H100 or B200 GPUs. According to NVIDIA’s Q4 2025 earnings call, enterprise demand for these chips outstripped supply by a factor of 3.

This scarcity creates a deployment barrier. Small teams cannot host open weights locally; they must rent infrastructure from cloud providers like AWS, Azure, or Google Cloud. These providers charge premiums that erode the cost advantage of open weights. In March 2026, a cost analysis by the infrastructure firm Anyscale showed that hosting Llama 4 on AWS Sagemaker was 2.5x more expensive than using the OpenAI API for equivalent token volumes, once idle time and provisioning costs were included.

Quantization helps but introduces accuracy loss. A February 2026 study by the University of Washington found that 4-bit quantization of Llama 4 resulted in a 5–8% drop in reasoning benchmarks. For many enterprises, this trade-off is acceptable for internal tools but not for customer-facing products. The infrastructure bottleneck means that open weights are often centralized anyway—hosted on large cloud providers who charge for the convenience of running them. The “open” aspect is preserved in the weights, but the deployment remains proprietary.

Conclusion: What changes the picture going forward

The current data indicates that open weights are not replacing closed models; they are complementing them in a tiered architecture. To shift this balance, three specific conditions must be met. First, inference costs for open weights must drop below closed APIs by a factor of 3, not 1.5. Current trends in hardware efficiency (MoE, quantization) suggest this is possible by 2027, but only if chip supply constraints ease. Second, safety alignment must be standardized. If a third-party vendor can certify open weights for enterprise use without requiring custom engineering, the hidden cost barrier disappears. Third, regulatory clarity on data residency and model provenance must stabilize. Until US and EU regulators define the liability framework for open-weight deployments, enterprises will remain risk-averse.

The picture changes if agentic capabilities improve. If open weights can reliably execute multi-step workflows (e.g., “book travel, approve budget, send email”) with less than 1% error rate, the value proposition shifts from “cheaper” to “autonomous.” Current benchmarks suggest this is 12–18 months away. Until then, the “open-weight momentum” is a story of developer adoption and cost-optimization, not a fundamental displacement of the closed ecosystem. Leaders should treat open weights as a strategic reserve—a tool for cost control and vendor diversification—rather than a wholesale replacement for proprietary capabilities. The evidence supports a hybrid future, not a binary victory.