GPS News
TECH SPACE
Shrinking AI memory improves LLM accuracy
illustration only

Shrinking AI memory improves LLM accuracy

by Sophie Jenkins
London, UK (SPX) Dec 26, 2025

Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or reduce the energy needed to run them.

Experts from the University of Edinburgh and NVIDIA found that large language models using memory eight times smaller than an uncompressed system scored better on maths, science, and coding tests while spending the same amount of time reasoning. The method can also be configured so that models respond to more user queries simultaneously, lowering the power required per task.

The approach focuses on the models' key-value cache, or KV cache, which stores segments of step-by-step reasoning sequences known as reasoning threads. As models generate more threads or extend them, the KV cache grows and becomes slower to retrieve, creating a bottleneck during inference when the system answers prompts.

To address this, the team developed Dynamic Memory Sparsification (DMS), a technique that compresses the KV cache by deciding which tokens to retain and which to delete. Instead of keeping every token, DMS selects those judged most important so the model keeps useful context while reducing memory use.

There is a short delay between deciding to delete tokens and actually removing them, which gives the model time to transfer valuable information from tokens that will be evicted into those that remain. By managing token eviction in this way, DMS allows the AI model to explore more possible solutions or reason in greater depth without extra compute.

The researchers tested DMS on different versions of the Llama and Qwen model families and compared their performance with non-compressed baselines. Even when memory was compressed to one eighth of its original size, large language models maintained their accuracy on difficult tasks and produced results faster than non-compressed systems.

In the AIME 24 mathematics test, which serves as a qualifier for the United States Mathematical Olympiad, compressed models performed twelve points better on average while using the same number of KV cache reads per answer. On GPQA Diamond, a set of complex questions in biology, chemistry, and physics authored by PhD-level experts, the compressed models scored more than eight points higher.

The models were also evaluated with LiveCode Bench, which measures how well AI systems write code. In these tests, compressed models scored about ten points better on average than non-compressed models, indicating that KV cache compression can preserve and enhance reasoning quality while operating with much smaller memory budgets.

The findings were peer reviewed and presented at the NeurIPS 2025 conference. The paper, titled "Inference-Time Hyper-Scaling with KV Cache Compression," is available at https://openreview.net/pdf?id=8ZiElzQxf1.

Dr Edoardo Ponti, GAIL Fellow and Lecturer in Natural Language Processing at the University's School of Informatics, said: "In a nutshell, our models can reason faster but with the same quality. Hence, for an equivalent time budget for reasoning, they can explore more and longer reasoning threads. This improves their ability to solve complex problems in maths, science, and coding."

Dr Ponti and his team will continue to study how large AI systems represent and remember information as part of a 1.5 million euros European Research Council-funded project called AToM-FM, which aims to make such systems more efficient and sustainable.

Research Report:Inference-Time Hyper-Scaling with KV Cache Compression

Related Links
University of Edinburgh
Space Technology News - Applications and Research

Subscribe Free To Our Daily Newsletters
Tweet

RELATED CONTENT
The following news reports may link to other Space Media Network websites.
TECH SPACE
US denies visas to EU ex-commissioner, four others over tech rules
Washington, United States (AFP) Dec 24, 2025
The US State Department said Tuesday it would deny visas to a former EU commissioner and four others, accusing them of seeking to "coerce" American social media platforms into censoring viewpoints they oppose. "These radical activists and weaponized NGOs have advanced censorship crackdowns by foreign states - in each case targeting American speakers and American companies," the department said in a statement announcing the sanctions. The measure targeted Thierry Breton, the former top tech regu ... read more

TECH SPACE
Black carbon from straw burning limits antibiotic resistance in plastic mulched fields

Drone phenomics sharpen genetic signals and automate field trait extraction in maize and peanut breeding

Australia 'disappointed' with China's beef tariffs

EU proposes indefinite approval for some pesticides

TECH SPACE
Tiny tech, big AI power: what are 2-nanometre chips?

Beetles block mining of Europe's biggest rare earths deposit

Brain like chips could cut AI power demand

China's MetaX soars 755% on debut on hopes for domestic chipmakers

TECH SPACE
Chinese leasing firm CALC orders 30 Airbus A320neo planes

US flew bombers, fighters and drones along Venezuela coast

First EU airline flight in 35 years lands in Baghdad

German MP urges split with France on fighter jet project

TECH SPACE
Tesla loses EV crown to China's BYD in 2025 as sales slip

China's BYD poised to overtake Tesla in 2025 EV sales

Norway closes in on objective of 100% electric car sales

China's BYD logs record EV sales in 2025

TECH SPACE
US halts imports of Chinese-made tires from Serbia over alleged forced labour

Silver slips lower in mixed end to Asia trading year

China's factory activity edges up, snapping 8-month slide

Stocks mostly rise, precious metals slip in quiet Asian trade

TECH SPACE
Indonesia to revoke 22 forestry permits after deadly floods

How deforestation turbocharged Indonesia's deadly floods

In blow to Lula, Brazil Congress revives controversial environmental bill

Restoration potential on urban fringes identified in Brazil

TECH SPACE
New NASA Sensor Goes Hunting for Critical Minerals

Sentinel 6B begins sea level mapping campaign

China lofts Tianhui 7 geological survey satellite on Long March 4B

NASA backs CINEMA smallsat fleet to probe Earth magnetotail

TECH SPACE
Bright emission from hidden quantum states demonstrated in nanotechnology breakthrough

Novel technique reveals true behavior of next-generation MXenes

Subscribe Free To Our Daily Newsletters




The content herein, unless otherwise known to be public domain, are Copyright 1995-2024 - Space Media Network. All websites are published in Australia and are solely subject to Australian law and governed by Fair Use principals for news reporting and research purposes. AFP, UPI and IANS news wire stories are copyright Agence France-Presse, United Press International and Indo-Asia News Service. ESA news reports are copyright European Space Agency. All NASA sourced material is public domain. Additional copyrights may apply in whole or part to other bona fide parties. All articles labeled "by Staff Writers" include reports supplied to Space Media Network by industry news wires, PR agencies, corporate press officers and the like. Such articles are individually curated and edited by Space Media Network staff on the basis of the report's information value to our industry and professional readership. Advertising does not imply endorsement, agreement or approval of any opinions, statements or information provided by Space Media Network on any Web page published or hosted by Space Media Network. General Data Protection Regulation (GDPR) Statement Our advertisers use various cookies and the like to deliver the best ad banner available at one time. All network advertising suppliers have GDPR policies (Legitimate Interest) that conform with EU regulations for data collection. By using our websites you consent to cookie based advertising. If you do not agree with this then you must stop using the websites from May 25, 2018. Privacy Statement. Additional information can be found here at About Us.