High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

13 points by jchandra2 days ago

2 comments

vivahir2152 days ago
Interesting Approach. Curious about the latency tradeoff: OLS + SVD are much heavier than Top-K.Have you benchmarked end-to-end inference latency?
- jchandra2 days ago
  [dead]
jchandra2 days ago
[dead]