Also see Fast GPU bounding boxes on tree-structured scenes[1] (unpublished paper) and notes toward a blog post[2]. This is a highly tuned GPU implementation of parentheses matching. It's actually used in Vello (the classic version in which we offload basically all the work to the GPU, not the newer CPU-GPU hybrid version in which tracking the blend stack is done on the CPU).<p>Earlier versions of the work were featured on HN [3][4], but this is much more sophisticated. (plus a few more zero-comment submissions)<p>The basic idea (bicyclic semigroup and binary search) is the same as the submission. I think earliest attribution is to Bar-On and Vishkin[5] from 1985. Another implementation of this idea is in pareas[6], an experimental GPU-accelerated compiler.<p>I believe this work is publishable and would love to work with a student to resubmit it. Especially if you're a student or prof in Sydney, please reach out.<p>[1]: <a href="https://arxiv.org/abs/2205.11659" rel="nofollow">https://arxiv.org/abs/2205.11659</a><p>[2]: <a href="https://github.com/raphlinus/raphlinus.github.io/issues/66" rel="nofollow">https://github.com/raphlinus/raphlinus.github.io/issues/66</a><p>[3]: <a href="https://news.ycombinator.com/item?id=24385095">https://news.ycombinator.com/item?id=24385095</a><p>[4]: <a href="https://news.ycombinator.com/item?id=27164009">https://news.ycombinator.com/item?id=27164009</a><p>[5]: <a href="https://dl.acm.org/doi/10.1145/3318.3478" rel="nofollow">https://dl.acm.org/doi/10.1145/3318.3478</a><p>[6]: <a href="https://github.com/Snektron/pareas" rel="nofollow">https://github.com/Snektron/pareas</a>