4 comments

  • stefan_0 minutes ago
    More slop again. The way to get more throughput is to bump batch size, not to try and "multithread" job submits to the NPU as if its a CPU.
  • robinduckett1 hour ago
    Is there something special about yolov8 over later models (9-12)? It seems most of the research and working examples default to v8 despite it being 3 years old. Or just because it is what fits on this hardware?
    • snovv_crash54 minutes ago
      Newer versions aren't open source, or at least have murky licencing.
      • robinduckett43 minutes ago
        Ahh that’ll do it. A shame really, the later models seem to be fairly good just from my idle testing as an enthusiast.
    • alebal123bal48 minutes ago
      [flagged]
  • alebal123bal3 hours ago
    I built this while trying to understand how much of the RK3588S vision pipeline could be kept off the CPU.<p>The main trick is not the YOLO model itself, but the pipeline structure: MIPI capture through the ISP, resize&#x2F;color conversion through RGA, and YOLOv8n inference through all 3 NPU cores with one RKNN context per core. With a 3-thread inference pool the pipeline goes from ~31 FPS to the OS08A10 camera’s 46 FPS ceiling.<p>The memory footprint is also small: roughly 137–152 MB RSS for one 1080p stream, using a fixed preallocated buffer pool rather than per-frame allocations. Two streams are roughly 276–304 MB RSS.<p>The repo also has a multi-process side of the pipeline: detections are published over Unix-domain sockets to tracking, temporal features, a presence FSM, and an optional Qwen2.5-0.5B summary step. For the LLM step, the camera pipeline can temporarily blackout&#x2F;resume so RKLLM gets the whole NPU.<p>I split the work into three repos:<p>- runtime dual-stream YOLOv8n RK3588S pipeline: <a href="https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;khadas_yolov8n_multithread" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;khadas_yolov8n_multithread</a><p>- train&#x2F;export&#x2F;INT8 RKNN conversion for YOLOv8&#x2F;YOLOv5: <a href="https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;RKNN_TRAIN_YOLO" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;RKNN_TRAIN_YOLO</a><p>- Qwen on RK3588S, via RKLLM&#x2F;NPU or llama.cpp&#x2F;CPU: <a href="https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;RKLLM_LLAMA_QWEN" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alebal123bal&#x2F;RKLLM_LLAMA_QWEN</a><p>The demo class is UAV&#x2F;drone, but this is meant as a general edge-inference pipeline example, not an operational&#x2F;surveillance&#x2F;defense system.
  • ancientmoth1 hour ago
    [dead]