I built this while trying to understand how much of the RK3588S vision pipeline could be kept off the CPU.<p>The main trick is not the YOLO model itself, but the pipeline structure: MIPI capture through the ISP, resize/color conversion through RGA, and YOLOv8n inference through all 3 NPU cores with one RKNN context per core. With a 3-thread inference pool the pipeline goes from ~31 FPS to the OS08A10 camera’s 46 FPS ceiling.<p>The memory footprint is also small: roughly 137–152 MB RSS for one 1080p stream, using a fixed preallocated buffer pool rather than per-frame allocations. Two streams are roughly 276–304 MB RSS.<p>The repo also has a multi-process side of the pipeline: detections are published over Unix-domain sockets to tracking, temporal features, a presence FSM, and an optional Qwen2.5-0.5B summary step. For the LLM step, the camera pipeline can temporarily blackout/resume so RKLLM gets the whole NPU.<p>I split the work into three repos:<p>- runtime dual-stream YOLOv8n RK3588S pipeline:
<a href="https://github.com/alebal123bal/khadas_yolov8n_multithread" rel="nofollow">https://github.com/alebal123bal/khadas_yolov8n_multithread</a><p>- train/export/INT8 RKNN conversion for YOLOv8/YOLOv5:
<a href="https://github.com/alebal123bal/RKNN_TRAIN_YOLO" rel="nofollow">https://github.com/alebal123bal/RKNN_TRAIN_YOLO</a><p>- Qwen on RK3588S, via RKLLM/NPU or llama.cpp/CPU:
<a href="https://github.com/alebal123bal/RKLLM_LLAMA_QWEN" rel="nofollow">https://github.com/alebal123bal/RKLLM_LLAMA_QWEN</a><p>The demo class is UAV/drone, but this is meant as a general edge-inference pipeline example, not an operational/surveillance/defense system.