Tech
Running AI models is turning into a memory game
When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs — but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars worth of new data centers, the price for DRAM chips has jumped roughly 7x in the last year.
At the same time, there’s a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time. The companies that master it will be able to make the same queries with fewer tokens, which can be the difference between folding and staying in business.
Semiconductor analyst Dan O’Laughlin has an interesting look at the importance of memory chips on his Substack, where he talks with Val Bercovici, chief AI officer at Weka. They’re both semiconductor guys, so the focus is more on the chips than the broader architecture; the implications for AI software are pretty significant too.
I was particularly struck by this passage, in which Bercovici looks at the growing complexity of Anthropic’s prompt-caching documentation:
The tell is if we go to Anthropic’s prompt caching pricing page. It started off as a very simple page six or seven months ago, especially as Claude Code was launching — just “use caching, it’s cheaper.” Now it’s an encyclopedia of advice on exactly how many cache writes to pre-buy. You’ve got 5-minute tiers, which are very common across the industry, or 1-hour tiers — and nothing above. That’s a really important tell. Then of course you’ve got all sorts of arbitrage opportunities around the pricing for cache reads based on how many cache writes you’ve pre-purchased.
The question here is how long Claude holds your prompt in cached memory: you can pay for a 5-minute window, or pay more for an hour-long window. It’s much cheaper to draw on data that’s still in the cache, so if you manage it right, you can save an awful lot. There is a catch though: every new bit of data you add to the query may bump something else out of the cache window.
This is complex stuff, but the upshot is simple enough: Managing memory in AI models is going to be a huge part of AI going forward. Companies that do it well are going to rise to the top.
And there is plenty of progress to be made in this new field. Back in October, I covered a startup called TensorMesh that was working on one layer in the stack known as cache-optimization.
Techcrunch event
Boston, MA
|
June 23, 2026
Opportunities exist in other parts of the stack. For instance, lower down the stack, there’s the question of how data centers are using the different types of memory they have. (The interview includes a nice discussion of when DRAM chips are used instead of HBM, although it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to structure their model swarms to take advantage of the shared cache.
As companies get better at memory orchestration, they’ll use fewer tokens and inference will get cheaper. Meanwhile, models are getting more efficient at processing each token, pushing the cost down still further. As server costs drop, a lot of applications that don’t seem viable now will start to edge into profitability.
Tech
Snapchat launches creator subscriptions in the US
Social network Snapchat announced today it’s launching creator subscriptions in alpha with select people in the U.S. starting on February 23. The company noted that users will be able to buy subscriptions to creators, including Jeremiah Brown, Harry Jowsey, and Skai Jackson. This will allow users to unlock exclusive content while creating monetization opportunities for creators.
Creators can set their own monthly prices for subscription within the app, while Snap will recommend different tiers to them. The subscription will unlock subscriber-only content, priority replies to a creator’s public Stories, and ad-free consumption for that creator’s Stories.
Snap noted that this is a new way for creators to earn more money besides the existing programs.
“Expanding on existing monetization offerings like the Unified Monetization Program and the Snap Star Collab Studio, Creator Subscriptions introduce a premium layer of connection directly into how Snapchatters already engage with creators across Stories, Chat, and replies,” the company said in the blog post.
Snapchat reached 946 million daily active users, according to the company’s Q4 2025 results. The platform noted during its earnings that the number of U.S.-based users posting to Spotlight grew over 47% year-over-year. The company also spun out hardware to a new entity called Specs last month.
The company added that it plans to expand the program to Snap Stars in Canada, the U.K., and France in the coming weeks.
Rival company Meta also allows creators to offer subscriptions on platforms like Instagram and Facebook, which gives users access to exclusive content and badges.
Techcrunch event
Boston, MA
|
June 23, 2026
Tech
Mistral AI buys Koyeb in first acquisition to back its cloud ambitions
Mistral AI, the French company last valued at $13.8 billion, has made its first acquisition. The OpenAI competitor has agreed to buy Koyeb, a Paris-based startup that simplifies AI app deployment at scale and manages the infrastructure behind it.
Mistral has been primarily known for developing large language models (LLMs), but this deal confirms its ambitions to position itself as a full-stack player. In June 2025, it had announced Mistral Compute, an AI cloud infrastructure offering which it now hopes Koyeb will accelerate.
Founded in 2020 by three former employees of French cloud provider Scaleway, Koyeb aimed to help developers process data without worrying about server infrastructure — a concept known as serverless. This approach gained relevance as AI grew more demanding, also inspiring the recent launch of Koyeb Sandboxes, which provide isolated environments to deploy AI agents.
Before the acquisition, Koyeb’s platform already helped users deploy models from Mistral and others. In a blog post, Koyeb said its platform will continue operating. But its team and technology will now also help Mistral deploy models directly on clients’ own hardware (on premises), optimize its use of GPUs, and help scale AI inference — the process of running a trained AI model to generate responses — according to a press release from Mistral.
As part of the deal, Koyeb’s 13 employees and its three co-founders, Yann Léger, Edouard Bonlieu, and Bastien Chatelard (pictured above in 2020), are set to join the engineering team of Mistral, overseen by CTO and co-founder Timothée Lacroix. Under his leadership, Koyeb expects its platform to transition into a “core component” of Mistral Compute over the coming months.
“Koyeb’s product and expertise will accelerate our development on the Compute front, and contribute to building a true AI cloud,” Lacroix wrote in a statement. Mistral has been ramping up its cloud ambitions. Just a few days ago, the company announced a $1.4 billion investment in data centers in Sweden amid growing demand for alternatives to U.S. infrastructure.
Koyeb had raised $8.6 million to date, including a $1.6 million pre-seed round in 2020, followed in 2023 by a $7 million seed round led by Paris-based VC firm Serena, whose principal Floriane de Maupeou celebrated the acquisition. For the firm, this combination will play a key role “in building the foundations of sovereign AI infrastructure in Europe,” she told TechCrunch.
Techcrunch event
Boston, MA
|
June 23, 2026
In part thanks to these geopolitical tailwinds, but also due to its focus on helping enterprises unlock value from AI, Mistral recently passed the milestone of $400 million in annual recurring revenue. Koyeb, too, will be focused on enterprise clients going forward, and new users will no longer be able to sign up for its Starter tier.
Mistral didn’t disclose financial terms of the deal, and it is unknown whether other acquisitions are in the works. But speaking at Stockholm’s Techarena conference last week, CEO Arthur Mensch said Mistral is hiring for infrastructure and other roles, pitching the company to prospective employees as an organization that is “headquartered in Europe, that is doing frontier research in Europe.”
Tech
Anthropic releases Sonnet 4.6
Anthropic has released a new version of its midsized Sonnet model, keeping pace with the company’s four-month update cycle. In a post announcing the new model, Anthropic emphasized improvements in coding, instruction-following, and computer use.
Sonnet 4.6 will be the default model for Free and Pro plan users.
The beta release of Sonnet 4.6 will include a context window of 1 million tokens, twice the size of the largest window previously available for Sonnet. Anthropic described the new context window as “enough to hold entire codebases, lengthy contracts, or dozens of research papers in a single request.”
The release comes just two weeks after the launch of Opus 4.6, with an updated Haiku model likely to follow in the coming weeks.
The launch comes with a new set of record benchmark scores, including OS World for computer use and SWE-Bench for software engineering. But perhaps the most impressive is its 60.4% score on ARC-AGI-2, meant to measure skills specific to human intelligence. The score puts Sonnet 4.6 above most comparable models, although it still trails models like Opus 4.6, Gemini 3 Deep Think, and one refined version of GPT 5.2.
