Nvidia's new open weights Nemotron 3 super combines three different architectures to beat gpt-oss and Qwen in throughput

Nvidia Unveils Nemotron 3 Super: Revolutionizing Multi-Agent Systems

Multi-agent systems are crucial for handling complex tasks like software engineering and cybersecurity triaging. However, the token volume generated by these systems can be up to 15 times higher than standard chats, impacting cost-effectiveness for enterprise tasks.

To address this challenge, Nvidia has introduced Nemotron 3 Super, a groundbreaking 120-billion-parameter hybrid model with weights available on Hugging Face. This model merges state-space models, transformers, and a unique “Latent” mixture-of-experts design to provide specialized depth for agentic workflows without the bloat of dense reasoning models.

Innovative Hybrid Architecture

Nemotron 3 Super features a sophisticated hybrid architecture that balances memory efficiency with precise reasoning. It leverages a Hybrid Mamba-Transformer backbone, combining Mamba-2 layers with Transformer attention layers for efficient sequence processing.

One of the key advantages of this architecture is its ability to handle a massive 1-million-token context window without excessive memory usage. Additionally, the model strategically incorporates Transformer attention layers to ensure accurate retrieval of specific information buried deep within datasets.

Furthermore, Nemotron 3 Super introduces Latent Mixture-of-Experts (LatentMoE) to optimize computational efficiency. By compressing tokens before routing them to specialists, the model can consult four times as many experts without increasing computational costs.

Enhanced Performance and Efficiency

Nvidia’s Nemotron 3 Super also features Multi-Token Prediction (MTP), which enables the model to predict multiple future tokens simultaneously. This capability enhances performance for structured generation tasks like code generation and tool calls, resulting in up to 3x faster processing.

Moreover, the model is optimized for the Nvidia Blackwell GPU platform, boasting 4x faster inference than previous architectures with no compromise on accuracy. It outperforms other models in throughput, achieving up to 2.2x higher throughput in high-volume settings.

Commercial Usage with Unique Licensing

Nvidia has released Nemotron 3 Super under the Nvidia Open Model License Agreement, allowing for commercial use with specific provisions. Users have the flexibility to create derivative models while adhering to attribution requirements. However, the license includes safeguards to prevent misuse, such as termination triggers for bypassing safety features or initiating IP litigation against Nvidia.

Industry Adoption and Impact

The launch of Nemotron 3 Super has garnered significant attention in the developer community, with industry experts praising its speed and transparency. The model is being integrated into various sectors, from software development to manufacturing and cybersecurity automation.

Nvidia’s VP of AI Software, Kari Briski, emphasizes the model’s ability to address the challenges of context explosion in multi-agent applications, offering the brainpower of a 120-billion-parameter system with operational efficiency. For enterprises, Nemotron 3 Super represents a significant step towards reducing the “thinking tax” associated with complex tasks.

Source: Original Content Rewritten and Adapted from Nvidia’s Announcement