Articles Tagged “Multimodal Ai”
12 articles found
NVIDIA's Nemotron 3 Nano Omni Lands — A 30B Open Omni-Modal Reasoning Model With 9x Higher Throughput
NVIDIA released Nemotron 3 Nano Omni on May 13, 2026 — a 30B-A3B open omni-modal model that unifies text, image, video, and audio reasoning with 9x higher throughput than other open omni models.
Google Makes Gemini API File Search Multimodal With Page-Level Citations
Google's Gemini API File Search now indexes images alongside text, ties responses to original page numbers, and ships free storage and query embeddings — turning verifiable RAG into a one-call developer primitive.
Mistral Small 4 Unifies Reasoning, Multimodal, and Coding Into One Apache 2.0 Model
Mistral Small 4 collapses three flagship model families — reasoning, multimodal, and agentic coding — into a single 119B-parameter Apache 2.0 model with a 256k context window.
OpenAI's 'Spud' Completes Pretraining — The Next Frontier Model Is Almost Here
OpenAI confirmed its next frontier model, codenamed Spud, finished pretraining on March 24 — a unified multimodal AI expected to arrive within weeks.
Meta Muse Spark Launches From Superintelligence Labs: Personal AI Gets Its Biggest Upgrade Yet
Meta Superintelligence Labs launched Muse Spark on April 9 — a natively multimodal reasoning model now powering personal AI across all Meta platforms.
LG Releases EXAONE 4.5: Open-Source Vision-Language AI That Outscores GPT-5-mini
LG AI Research's EXAONE 4.5 is a 33B multimodal VLM with Hybrid Attention architecture that outscores GPT-5-mini and Claude 4.5 Sonnet on STEM benchmarks — and it's fully open-source.
Meta Launches Llama 4 Scout and Maverick: Multimodal MoE AI Goes Open-Weight
Meta's Llama 4 Scout and Maverick bring multimodal mixture-of-experts AI to the open-source community, with an unprecedented 10 million token context window.
Google Gemma 4 Launches With Four Sizes, Apache 2.0 License, and a Top-3 Open Model Ranking
Google's Gemma 4 arrives with model sizes from 2B to 31B, a permissive Apache 2.0 license, native multimodal support across all sizes, and the #3 spot on the global open model leaderboard.
Google Launches Gemini Embedding 2 — The First AI Model That Maps Text, Images, and Video Into a Single Search Space
Google's new natively multimodal embedding model jointly maps text, images, and video into a unified vector space, enabling cross-modal retrieval and RAG applications.
OpenAI Is Bringing Sora Video Generation Directly Into ChatGPT — Giving Hundreds of Millions of Users Access
According to The Information, OpenAI plans to embed Sora's video-generation capabilities into the ChatGPT interface, mirroring how DALL-E image creation was integrated.
DeepSeek Unveils V4 — A Trillion-Parameter Multimodal Model That Generates Text, Images, and Video
DeepSeek's V4 model enters the frontier tier with trillion-parameter multimodal capabilities spanning text, image, and video generation plus elite coding performance.
Google Gemini 3.1 Pro Doubles Reasoning Performance With a New Three-Tier Thinking System
Google DeepMind’s Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than doubling its predecessor’s reasoning with a three-tier thinking architecture.












