
HyperNova 60B Uses Quantum-Inspired Math to Halve an LLM's Size With Near-Zero Accuracy Loss
Multiverse Computing's free HyperNova 60B compresses a 120B-parameter model by 50% using quantum tensor methods, benchmarking 5x better on tool-calling tasks.
A Spanish startup just demonstrated something that could reshape how organizations deploy large language models. Multiverse Computing released HyperNova 60B on Hugging Face for free on February 24, and the numbers behind it deserve serious attention from anyone working in AI infrastructure.
Quantum Math Meets Model Compression
HyperNova 60B was created by applying Multiverse Computing's CompactifAI technology to OpenAI's open-source gpt-oss-120B model. The technique uses quantum-inspired tensor network decomposition — mathematical methods borrowed from quantum physics — to identify and remove redundant parameters without destroying the model's learned capabilities.
The result: a 120-billion-parameter model compressed to 60 billion parameters. The file size dropped from 61 gigabytes to 32 gigabytes. And the accuracy loss? Within 2 to 3 percent across standard benchmarks.
Benchmark Results That Justify the Approach
Raw compression ratios are meaningless if the model stops performing well. HyperNova 60B's benchmarks tell a compelling story. On Tau2-Bench, which measures real-world tool-calling ability, the compressed model scored five times higher than the previous HyperNova release. On Terminal Bench Hard, a challenging coding evaluation, performance doubled.
These are not marginal improvements. The quantum-inspired compression appears to preserve — and in some configurations enhance — the model's ability to handle complex, multi-step tasks that enterprise customers care about most.
Why 50 Percent Compression Changes the Economics
Running a 120-billion-parameter model requires expensive GPU clusters with substantial VRAM. Cutting that to 60 billion parameters means the same model can run on roughly half the hardware. For organizations paying thousands of dollars per month for cloud GPU instances, that translates directly into halved inference costs.
The implications extend beyond cost savings. A model that fits in 32 gigabytes instead of 61 can run on a single high-end consumer GPU rather than requiring multi-GPU configurations. This opens the door for research labs, startups, and even advanced hobbyists to experiment with frontier-class models on accessible hardware.
Free and Open on Hugging Face
Perhaps the most significant detail is the licensing. HyperNova 60B is available for free download on Hugging Face, making it immediately accessible to the global AI research community. Multiverse Computing is betting that widespread adoption of their compression technique will drive enterprise interest in their commercial CompactifAI platform, which claims compression ratios up to 95 percent for custom deployments.
The Broader Vision for Quantum-Inspired AI
Multiverse Computing, headquartered in San Sebastián, Spain, has raised over $46 million in funding and operates at the intersection of quantum computing and practical AI. Their thesis is that mathematical frameworks from quantum physics can solve optimization problems in AI that classical approaches struggle with.
If their compression claims hold across more architectures, the implications are profound. Running what once required a data center rack on a single workstation would democratize access to powerful AI in ways that even the open-source movement has not yet achieved.
Sources: TechCrunch, February 24, 2026; Quantum Zeitgeist, February 2026; Yahoo Finance, February 2026; Hugging Face Model Card, February 2026
