2 comments
How could this lend insight into why Fast Fourier Transform approximates self-attention?<p>> <i>Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.</i><p>[1] "Fnet: Mixing tokens with fourier transforms" (2021) <a href="https://arxiv.org/abs/2105.03824" rel="nofollow">https://arxiv.org/abs/2105.03824</a> ..
"Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs"
<a href="https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-19/" rel="nofollow">https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-ba...</a><p>"Why formalize mathematics – more than catching errors" (2025) <a href="https://news.ycombinator.com/item?id=45695541">https://news.ycombinator.com/item?id=45695541</a><p>Can the QFT <i>Quantum</i> Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs, and do Lean formalisms provide any insight into how or why?
I guess the next step would be adding support for quantized arithmetic.