Open AI shenanigans in Cerberas IPO. They have a deal to pay a minimum of $10.53/m tokens (under 100% 24/7 capacity utilization) that is more like $40-$50/m in real world. They charge $14/m (lemmy.ca)

submitted 4 days ago by humanspiral@lemmy.ca to c/technology@lemmy.ml

1 comments fedilink hide all child comments

Cerberas had a good IPO today. Their technology is very fast, but in terms of throughput, Nvidia NVL72 is 7x cheaper. FYI, Huawei current and last generation is even cheaper than NVIDIA per token. One advantage they have, though is that production is not dependent on HBM3/4 ram. A big disadvantage, is their software is difficult, and they don't offer any models newer than about 1 year old, as they are slow in implementing bleeding edge optimizations. Long contexts also are extra slow relatively on cerberas.

Their 2nd customer is OpenAI. Under a $20B 3 year lease for up to 750mw of compute (equal to 6 years of 250mw blocks) the most optimistic cost per token possible for OpenAI is $10.53/m. 100% utilization. Real world realistic optimism is 20% capacity = over $50/m. OpenAI is initially using cluster to run Codex-Spark 5.3, which they charge customers $14/m tokens. OpenAI also has the privilege of paying for all OPEX. Power alone at just 7c/kwh, adds 50c/m tokens ideal.

Their first customer was UAE monarchy owned group, g42. Even if UAE has permission for NVidia, cerberas has quicker delivery, and UAE helped with/controls software. Apparently, Arabic has advantages on the chip, but they are still planning on Nvidia dominated based expansions with patriot air defense guarding systems.

you are viewing a single comment's thread
view the rest of the comments

[-] humanspiral@lemmy.ca 1 points 3 days ago* (last edited 3 days ago)

I made a math mistake. Theoretical minimum cost to openAI is $3.15/m ($3.30/m with electricity) tokens, as cerebras has fixed context windows per user, and codex spark allows 3.33 concurrent users per node. That is still $16.50/m optimistic (20% of theoretical capacity) cost for $14/m revenue.

I guess there is a market for very fast response tasks. OpenAI does have a routing system that charges a high cost per token, but gets most of the work done by their smaller/cheaper models behind the scenes.

But, this turns out not to be ultra stupid if OpenAI has the internal training/improvement token workload to completely saturate the datacenter for its own use. Cerebras does have a training advantage over nvidia. It's immature software stack only applies to cutting edge inference techniques.

this post was submitted on 14 May 2026

9 points (90.9% liked)

Technology

42585 readers

128 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 7 years ago

MODERATORS

MinutePhrase@lemmy.ml