February 8, 2026PricingSource: Together AI Blog

Together AI Cuts Inference Costs 50% as Open-Source Competition Intensifies

Together AI has slashed inference pricing by 50% across all hosted models, responding to competitive pressure from DeepSeek's ultra-low-cost API and ongoing optimization improvements on its inference infrastructure. The price cuts apply to all models on the platform, including Llama 4, Mistral Large 3, Qwen 2.5, and dozens of community fine-tuned variants. Llama 4 Maverick 400B inference now costs just $0.50 per million input tokens, down from $1.00, making it one of the most affordable hosted inference options for a frontier-competitive model. Together AI attributes half of the cost reduction to competitive dynamics — particularly DeepSeek's disruptive pricing — and half to genuine infrastructure improvements including better GPU utilization, optimized batching, and a new speculative decoding implementation that increases throughput by 40%. The company also introduced a new Turbo tier with lower latency guarantees for production applications, and expanded its fine-tuning service to support RLHF and DPO training methods in addition to standard supervised fine-tuning. CEO Vipul Ved Prakash stated that the AI inference market is entering a commoditization phase, where providers must differentiate on developer experience, reliability, and ecosystem rather than raw model access. Together AI now hosts over 150 models with a focus on rapid availability of new open-source releases.

Together AI has slashed inference pricing by 50% across all hosted models, responding to competitive pressure from DeepSeek's ultra-low-cost API and ongoing optimization improvements on its inference infrastructure.

The price cuts apply to all models on the platform, including Llama 4, Mistral Large 3, Qwen 2.5, and dozens of community fine-tuned variants. Llama 4 Maverick 400B inference now costs just $0.50 per million input tokens, down from $1.00.

Together AI attributes half of the cost reduction to competitive dynamics — particularly DeepSeek's disruptive pricing — and half to genuine infrastructure improvements including better GPU utilization, optimized batching, and a new speculative decoding implementation that increases throughput by 40%.

The company also introduced a new Turbo tier with lower latency guarantees for production applications, and expanded its fine-tuning service to support RLHF and DPO training methods in addition to standard supervised fine-tuning.

CEO Vipul Ved Prakash stated that the AI inference market is entering a commoditization phase, where providers must differentiate on developer experience, reliability, and ecosystem rather than raw model access.

Together AI now hosts over 150 models with a focus on rapid availability of new open-source releases. The company reported that it processes over 2 billion tokens per day and has seen API traffic grow 5x in the past year.

The pricing pressure is being felt across the industry, with Replicate, Anyscale, and Fireworks AI all announcing similar reductions in recent weeks. Analysts predict that hosted inference costs will continue to fall 30-40% annually as hardware improves and competition intensifies.

Related Tools

More News

March 13, 2026Product Update

NVIDIA Launches NIM Microservices for Enterprise AI Deployment

NVIDIA has launched NIM (NVIDIA Inference Microservices), a suite of containerized AI model serving packages that reduce enterprise AI deployment time from weeks to hours with optimized inference performance.

March 13, 2026Industry

AI Agents Market Reaches $15 Billion as Enterprise Adoption Surges

The global market for AI agents — autonomous AI systems that can plan, execute, and iterate on complex multi-step tasks — has reached $15 billion in annual spending, according to a new report from McKinsey. This represents a 200% increase from 2025, driven by enterprise adoption of agentic AI for customer service, software development, data analysis, and business process automation. The report identifies three tiers of AI agent adoption: basic agents that handle single-step tasks like email responses and appointment scheduling (adopted by 65% of enterprises), intermediate agents that manage multi-step workflows like report generation and data pipeline management (35% adoption), and advanced agents that autonomously execute complex processes like code deployment and financial analysis (8% adoption). The largest spending categories are customer service agents ($4.2B), coding agents ($3.8B), and data analysis agents ($2.5B). McKinsey projects the market will reach $45 billion by 2028 as agent reliability improves and enterprises become more comfortable delegating complex decisions to AI. Key enabling platforms include OpenAI's Agents SDK, Anthropic's Claude computer-use capabilities, and LangChain's agent framework. The report warns that agent governance and monitoring remain underdeveloped, with most enterprises lacking adequate oversight mechanisms for autonomous AI actions.

March 12, 2026Product Update

Microsoft 365 Copilot Gets Custom AI Agents and Actions

Microsoft has updated 365 Copilot with custom AI agent creation, allowing organizations to build agents that automate complex workflows spanning Word, Excel, Outlook, Teams, and SharePoint without code.

March 12, 2026Analysis

GPT-5.2's Agentic Mode Transforms Enterprise Workflows

OpenAI's GPT-5.2 introduced a fundamentally new approach to agentic task completion that is already transforming enterprise workflows. The model can now maintain coherent plans across 50+ sequential tool calls with parallel execution, reducing latency in complex automation pipelines by up to 60%. Early enterprise adopters report that GPT-5.2's agentic mode handles tasks like multi-step data analysis, cross-platform content publishing, and automated code review workflows that previously required custom orchestration code. The key innovation is what OpenAI calls deliberative alignment — a training approach that lets the model dynamically allocate compute to harder sub-tasks while breezing through simpler ones. This means a single agentic session can handle both quick lookups and deep reasoning without manual configuration. Several Fortune 500 companies have reported 40-70% time savings on analyst workflows by deploying GPT-5.2 agents through the API. However, reliability remains a concern — OpenAI acknowledges a 3-5% failure rate on chains exceeding 30 steps, and enterprise deployments require human-in-the-loop checkpoints for critical decisions.