2 comments
Discussion on reddit:
<a href="https://www.reddit.com/r/LocalLLaMA/comments/1rewis9/removed_by_moderator/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/comments/1rewis9/removed...</a>
This is so freaking awesome, I am working on a project trying run 10 models on two GPUs, loading/off loading is the only solution I have in mind.<p>Will try getting this deployed.<p>Does cold start timings advertised for a condition where there is no other model loaded on GPUs?