Connect with us

Tech

Running AI models is turning into a memory game

When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars worth of new data centers, the price for DRAM chips has jumped roughly 7x in the last year.

At the same time, there’s a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. The companies that master it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.

Semiconductor analyst Dan O’Laughlin has an interesting look at the importance of memory chips on his Substack, where he talks with Val Bercovici, chief AI officer at Weka. They’re both semiconductor guys, so the focus is more on the chips than the broader architecture; the implications for AI software are pretty significant too.

I was particularly struck by this passage, in which Bercovici looks at the growing complexity of Anthropic’s prompt-caching documentation:

The tell is if we go to Anthropic’s prompt caching pricing page. It started off as a very simple page six or seven months ago, especially as Claude Code was launching — just “use caching, it’s cheaper.” Now it’s an encyclopedia of advice on exactly how many cache writes to pre-buy. You’ve got 5-minute tiers, which are very common across the industry, or 1-hour tiers — and nothing above. That’s a really important tell. Then of course you’ve got all sorts of arbitrage opportunities around the pricing for cache reads based on how many cache writes you’ve pre-purchased.

The question here is how long Claude holds your prompt in cached memory: you can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to draw on data that’s still in the cache, so if you manage it right, you can save an awful lot. There is a catch though: every new bit of data you add to the query may bump something else out of the cache window.

This is complex stuff, but the upshot is simple enough: Managing memory in AI models is going to be a huge part of AI going forward. Companies that do it well are going to rise to the top.

And there is plenty of progress to be made in this new field. Back in October, I covered a startup called TensorMesh that was working on one layer in the stack known as cache-optimization.

Techcrunch event

Boston, MA
|
June 23, 2026

Opportunities exist in other parts of the stack. For instance, lower down the stack, there’s the question of how data centers are using the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, although it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to structure their model swarms to take advantage of the shared cache.

As companies get better at memory orchestration, they’ll use fewer tokens and inference will get cheaper. Meanwhile, models are getting more efficient at processing each token, pushing the cost down still further. As server costs drop, a lot of applications that don’t seem viable now will start to edge into profitability.

source

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Cohere launches a family of open multilingual models

Enterprise AI company Cohere launched a new family of multilingual models on the sidelines of the ongoing India AI Summit. The models, dubbed Tiny Aya, are open-weight — meaning their underlying code is publicly available for anyone to use and modify — support over 70 languages, and can run on everyday devices like laptops without requiring an internet connection.

The model, launched by the company’s research arm Cohere Labs, supports South Asian languages such as Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi. 

The base model contains 3.35 billion parameters — a measure of its size and complexity. Cohere has also launched TinyAya-Global, a version fine-tuned to better follow user commands, for apps that require broad language support. Regional variants round out the family: TinyAya-Earth for African languages; TinyAya-Fire for South Asian languages; and TinyAya-Water for Asia Pacific, West Asia, and Europe.

Image Credits: Cohere

“This approach allows each model to develop stronger linguistic grounding and cultural nuance, creating systems that feel more natural and reliable for the communities they are meant to serve. At the same time, all Tiny Aya models retain broad multilingual coverage, making them flexible starting points for further adaptation and research,” the company said in a statement.

Cohere noted that these models, which were trained on a single cluster of 64 H100 GPUs (a type of high-powered chip by Nvidia) using relatively modest computing sources, are ideal for researchers and developers building apps for audiences that speak native languages. The models are capable of running directly on devices, so developers can use them to power offline translation. The company noted that it built its underlying software to suit on-device usage, requiring less computing power than most comparable models.

Image Credits: Cohere

In linguistically diverse countries like India, this kind of offline-friendly capability can open up a diverse set of applications and use cases without the need for constant internet access.

The models are available on HuggingFace, the popular platform for sharing and testing AI models, and the Cohere Platform. Developers can download them on HuggingFace, Kaggle, and Ollama for local deployment. The company is also releasing training and evaluation datasets on HuggingFace and plans to release a technical report detailing its training methodology.

Techcrunch event

Boston, MA
|
June 23, 2026

The startup’s CEO, Aidan Gomez, said last year that the company plans to go public “soon.” According to CNBC, the company ended 2025 on a high note, posting $240 million in annual recurring revenue, with 50% growth quarter-over-quarter throughout the year.

source

Continue Reading

Tech

As AI jitters rattle IT stocks, Infosys partners with Anthropic to build ‘enterprise-grade’ AI agents

Indian IT giant Infosys said on Tuesday it has partnered with Anthropic to develop enterprise-grade AI agents, as automation driven by large language models reshapes the global IT services industry.

Under the partnership, Infosys plans to integrate Anthropic’s Claude models into its Topaz AI platform to build so-called “agentic” systems. The companies claim these agents will be able to autonomously handle complex enterprise workflows across industries such as banking, telecoms, and manufacturing. The tie-up was announced at India’s AI Impact Summit in New Delhi this week, which will see top executives from AI companies and Big Tech alike in attendance.

The deal comes amid fears that AI tools, especially those built by major AI labs like Anthropic and OpenAI, will disrupt India’s heavily-staffed, $280 billion IT services industry, raising questions about the future of labor-intensive outsourcing business models. Earlier this month, shares of Indian IT companies went into freefall after Anthropic launched a suite of enterprise AI tools that claimed to automate tasks across legal, sales, marketing and research roles.

The partnership would give Infosys, one of the world’s largest IT services businesses, access to Anthropic’s Claude models and developer tools for building AI agents tailored for large enterprises. Infosys said it would use Anthropic’s Claude Code to help write, test and debug code, and said it is already deploying the tool internally to build expertise that will be applied to client work.

Infosys also detailed how AI is contributing to its business: AI-related services generated revenue of ₹25 billion (around $275 million), or 5.5% of the company’s total revenue of ₹454.8 billion (about $5 billion) in the December quarter. Rival Tata Consultancy Services previously said its AI services generate about $1.8 billion annually, or around 6% of revenue.

For Anthropic, the partnership offers a route into heavily regulated enterprise sectors where deploying AI systems at scale requires industry expertise and governance capabilities.

“There’s a big gap between an AI model that works in a demo and one that works in a regulated industry,” said Anthropic co-founder and CEO Dario Amodei. Infosys’ experience in sectors such as financial services, telecoms, and manufacturing helps bridge that gap, he said.

Techcrunch event

Boston, MA
|
June 23, 2026

Anthropic this week also opened its first India office in Bengaluru, as it seeks to expand further into the country, which has grown into the company’s second-largest market. Anthropic said India now accounts for about 6% of global Claude usage, second only to the U.S., and much of that activity is concentrated in programming.

Infosys did not disclose the timeline for deploying Claude-powered AI agents or the financial terms of the deal.

The partnership is similar to other moves by Indian IT services firms. HCLTech and OpenAI last year partnered up to help enterprises deploy AI tools at scale.

source

Continue Reading

Tech

Airbnb expands its “Reserve Now, Pay Later” globally

Airbnb said on Tuesday that it is launching its “Reserve Now, Pay Later” feature — which lets users secure bookings without immediate payment — globally. This allows users to cancel their bookings if there is a change of plans without losing money upfront.

The company launched the feature in the U.S. last year for domestic travel. Airbnb said that properties with a “flexible” or “moderate” cancellation policy are eligible for the upfront reservation. With this option, users get charged closer to their check-in date rather than at the time of booking. The feature mirrors “buy now, pay later” payment plans that have become popular in e-commerce, making expensive travel more accessible by spreading out costs. The company noted that since the launch, the feature saw 70% adoption for eligible bookings.

Image Credits: AirbnbImage Credits:Airbnb

During its earnings calls for Q4 2025, Airbnb said that the feature helped grow nights booked in the quarter.

“Reserve Now, Pay Later saw significant adoption among eligible guests in Q4. It’s also led to longer booking lead times and a mix shift towards larger entire homes, especially those with four or more bedrooms, contributing to the increase in average daily rate,” Ellie Mertz, CFO of Airbnb, said during the call.

Mertz noted that Airbnb’s overall cancellation rate jumped from 16% to 17% for the quarter, and it was higher among customers who use the upfront booking product. However, she said that this was “not hugely material relative to the broader cancellations on the platform.”

Last year, the company surveyed U.S. travelers along with Focaldata, a London-based market research and polling company. Of those surveyed, 60% of participants said that a flexible payment option is important while booking a holiday, and 55% said that would use a flexible payment option.

The company has been experimenting with pay-later products for years now. Back in 2018, Airbnb launched a product that allowed users to book a property by paying 20% or 50% of the total charges. upfront, with the rest due later. In 2023, the company partnered with fintech firm Klarna to let users pay for their stays in four installments over six weeks.

Techcrunch event

Boston, MA
|
June 23, 2026

source

Continue Reading