5 comments

  • breput2 hours ago
    &gt; We&#x27;ll assume a 32B dense model, as they&#x27;ve have gotten quite good for production use and a B200 can comfortably serve them. This could be a Gemma, Qwen, DeepSeek, whatever.<p>That seems like a very consequential point to include halfway through the post. They aren&#x27;t wrong that Qwen 3.6 26B or Gemma 4 31B are quite good, depending on the use case, but if we&#x27;re doing napkin math, I&#x27;d want some more headroom in the assumptions.<p>They really ought to have Qwen parameterize their post&#x27;s calculations and add sliders so a reader could play around with the values.<p>Edit: And since they especially mentioned DeekSeek (or whatever), as far as I know, none of their current generation of models is a dense model, and even the smallest of the mixture of experts (MoE) models is 284B parameters (13B activated). That will completely incinerate their napkin.
    • martinald1 hour ago
      Yes 32B dense is a weird one to choose.<p>But in reality, 32B dense is very similar* to 32B activated on MoE in terms of inference costs. And I highly suspect eg Opus is around that level of active params.<p>A 284ba13b model at scale, is almost certainly cheaper to serve than a 32b dense model.<p>*as you can shard the model across multiple GPUs at scale. but in reality you have some loss of efficiency from GPU coordination and expert routing
      • breput55 minutes ago
        That&#x27;s good information. I couldn&#x27;t possibly even start to run even DeepSeek Flash on my system, but also if you&#x27;re assuming multiple GPUs, that is going to affect the napkin math.
  • smalltorch4 hours ago
    &gt;This largely depends on whether you own or rent your hardware. At $40,000 per B200, your lifetime cost per user is 40_000&#x2F;num_users. In the 100% duty cycle case (worst for cost), that&#x27;s 6k$ per user. Realistically, serving 300 users per GPU you&#x27;ll spend a lifetime cost of about $133 per user, plus the <i>datacenter&#x2F;upkeep bill</i>. If you rent the GPU, the cost is more straightforward. At an hourly rate of $43, your hourly cost per user is 4&#x2F;num_users. For num_users=300 you get an hourly rate of about $0.013 per user, or $9.36 per month.<p>This leads me to believe you can buy a GPU but leave it at a data center?<p>Do people do this? I don&#x27;t understand. Or are you equating upkeep bill to electricity on premises?
    • __s3 hours ago
      You can, people do. <a href="https:&#x2F;&#x2F;www.linkedin.com&#x2F;posts&#x2F;activity-7409593739138060288-OFmi" rel="nofollow">https:&#x2F;&#x2F;www.linkedin.com&#x2F;posts&#x2F;activity-7409593739138060288-...</a>
      • smalltorch3 hours ago
        So what&#x27;s the cost separating them from placing this box at their premise?<p>Network throughout?
  • JBAnderson52 hours ago
    &gt; Realistically, serving 300 users per GPU you&#x27;ll spend a lifetime cost of about $133 per user, plus the datacenter&#x2F;upkeep bill.<p>What is the operational cost and when does it become more expensive than the upfront capex?<p>The B200 tops out at 1000W and idles around 140W. It averages around 600W. <a href="https:&#x2F;&#x2F;www.lightly.ai&#x2F;blog&#x2F;nvidia-b200-vs-h100">https:&#x2F;&#x2F;www.lightly.ai&#x2F;blog&#x2F;nvidia-b200-vs-h100</a> U.S. average electricity cost is $.14 per kWh in March. <a href="https:&#x2F;&#x2F;www.eia.gov&#x2F;electricity&#x2F;monthly&#x2F;epm_table_grapher.php?t=epmt_5_6_a" rel="nofollow">https:&#x2F;&#x2F;www.eia.gov&#x2F;electricity&#x2F;monthly&#x2F;epm_table_grapher.ph...</a><p>600&#x2F;1000 *.14 =$0.084 per hour. $2.01 per day. $60.30 per month. With 300 users, $.20 per user per month. Seems fairly cheap for the electricity.<p>Does anyone know how to estimate colo&#x2F;data center rent costs? Where did I screw up my estimates?
    • BadBadJellyBean2 hours ago
      I wonder what the power costs are when you put jet turbines in front of your DC to power it.
      • martinald1 hour ago
        In general, less for fuel cost alone. But you obviously need to buy the turbines.
  • BadBadJellyBean2 hours ago
    I&#x27;d like to see a bit of the running costs inside the napkin math. Power, cooling, maintenance, rent, etc. are probably significant factors as well.
  • stevenaenns1 hour ago
    &gt; 2B = 562 =&gt; B = 331<p>what kind of math is this? why isn&#x27;t it B = 562 &#x2F; 2 = 281?