LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.
Researchers led by Takaki Hatsui at the RIKEN SPring-8 Center (RSC) in Japan and collaborators have developed a new approach ...
A pair of Carnegie Mellon University researchers recently discovered hints that the process of compressing information can solve complex reasoning tasks without pre-training on a large number of ...