8 comments

  • uberduper44 minutes ago
    There&#x27;s a few dimensions you can look at for gpu load. Probably the easiest indirect metric to watch for gpu load is power usage.<p>But if you really care about this, you should actually profile your application. nsight systems makes this pretty simple to do. Dunno how many actually care about having a TUI.
    • ManyaGhobadi0 minutes ago
      Power is useful as a second-order metric and can help catch drastic underutilization, but it has similar problems to SM Active (DCGM) -- it tends to overestimate utilization and doesn&#x27;t distinguish between useful compute and memory traffic. It&#x27;s very possible to be in a memory-bound workload with high power even though underutilizing compute utilization. Our goal was to separate these bottlenecks out so there&#x27;s more visibility into where to optimize.<p>On nsys, agreed it&#x27;s great, but we wanted something that could run continuously instead of an offline analysis tool. We think there&#x27;s room for both to be useful.
  • latchkey5 minutes ago
    You mention rocm-smi in your blog post, but you don&#x27;t actually support AMD gpus?
  • jhgg50 minutes ago
    We just track power utilization.
  • xtimecrystal1 hour ago
    One small suggestion: add more GPU stats to your tool.<p>At the moment (v0.1.3) it <i>is</i> more helpful for compute visualization but keeping track of memory usage&#x2F;processes&#x2F;temperature&#x2F;fan speed&#x2F;etc. prevent this from becoming a full-on drop-in replacement for `nvidia-smi` for me.
    • ManyaGhobadi32 minutes ago
      We agree! We are planning a &quot;process&quot; or &quot;advanced&quot; view with temp&#x2F;power usage and per-process breakdowns. Would a separate full page view or fitting everything onto one view be more useful for your workflows? Just thinking about fitting everything in because <i>it is a lot</i>
  • nawi1 hour ago
    Hi, many thx, does the os can run on nvidia jetson and orin? Or just for server gpu?
    • ManyaGhobadi26 minutes ago
      Currently just server GPU, but theoretically it should be easy to link against the ARM64 CUDA libraries for Jetson&#x2F;Orin. The only challenge would be to check if it supports all the metrics we&#x27;re sampling, though anything Ampere or newer should have reasonable support.
  • johnwhitman11 minutes ago
    [dead]
  • throwawaycbb71 hour ago
    [dead]
  • Rekindle809030 minutes ago
    [dead]