Papers

KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs
accepted for publication in IJRSI 2026

Whether attention key value (KV) states computed for one prompt for a small LLM can be reused to accelerate inference on a new similar prompt, giving an increase to the space to its context memory using token recycling.