DeepSeek-V3/R1 Inference System Overview

shihab 4 hours ago

Some very illuminating stats from the article:

“ Over the past 24 hours (UTC+8 02/27/2025 12:00 PM to 02/28/2025 12:00 PM), the combined peak node occupancy for V3 and R1 inference services reached 278, with an average occupancy of 226.75 nodes (each node contains 8 H800 GPUs). Assuming the leasing cost of one H800 GPU is $2 per hour, the total daily cost amounts to $87,072…

If all tokens were billed at DeepSeek-R1’s pricing (*), the total daily revenue would be $562,027, with a cost profit margin of 545%. However, our actual revenue is substantially lower for the following reasons…”

blackeyeblitzar 2 hours ago

How can profit margin be more than 100%? What do they mean by “cost profit margin”?

kevmo314 5 hours ago

Interesting they chose to split prefilling into its own independent service, I hadn't heard of that technique before. I found this paper that researches why that could be beneficial: https://arxiv.org/abs/2401.09670v1