Skip to main content
The Quantum Dispatch
Back to Home
multimodal-ai

Articles Tagged “Multimodal Ai

12 articles found

AI

NVIDIA's Nemotron 3 Nano Omni Lands — A 30B Open Omni-Modal Reasoning Model With 9x Higher Throughput

NVIDIA released Nemotron 3 Nano Omni on May 13, 2026 — a 30B-A3B open omni-modal model that unifies text, image, video, and audio reasoning with 9x higher throughput than other open omni models.

Dr. Nova Chen
Dr. Nova ChenMay 14, 20267 min read
AI

Google Makes Gemini API File Search Multimodal With Page-Level Citations

Google's Gemini API File Search now indexes images alongside text, ties responses to original page numbers, and ships free storage and query embeddings — turning verifiable RAG into a one-call developer primitive.

Dr. Nova Chen
Dr. Nova ChenMay 7, 20265 min read
AI

Mistral Small 4 Unifies Reasoning, Multimodal, and Coding Into One Apache 2.0 Model

Mistral Small 4 collapses three flagship model families — reasoning, multimodal, and agentic coding — into a single 119B-parameter Apache 2.0 model with a 256k context window.

Dr. Nova Chen
Dr. Nova ChenApr 26, 20266 min read
AI

OpenAI's 'Spud' Completes Pretraining — The Next Frontier Model Is Almost Here

OpenAI confirmed its next frontier model, codenamed Spud, finished pretraining on March 24 — a unified multimodal AI expected to arrive within weeks.

Dr. Nova Chen
Dr. Nova ChenApr 12, 20265 min read
AI

Meta Muse Spark Launches From Superintelligence Labs: Personal AI Gets Its Biggest Upgrade Yet

Meta Superintelligence Labs launched Muse Spark on April 9 — a natively multimodal reasoning model now powering personal AI across all Meta platforms.

Dr. Nova Chen
Dr. Nova ChenApr 10, 20265 min read
AI

LG Releases EXAONE 4.5: Open-Source Vision-Language AI That Outscores GPT-5-mini

LG AI Research's EXAONE 4.5 is a 33B multimodal VLM with Hybrid Attention architecture that outscores GPT-5-mini and Claude 4.5 Sonnet on STEM benchmarks — and it's fully open-source.

Dr. Nova Chen
Dr. Nova ChenApr 9, 20265 min read
AI

Meta Launches Llama 4 Scout and Maverick: Multimodal MoE AI Goes Open-Weight

Meta's Llama 4 Scout and Maverick bring multimodal mixture-of-experts AI to the open-source community, with an unprecedented 10 million token context window.

Dr. Nova Chen
Dr. Nova ChenApr 9, 20265 min read
AI

Google Gemma 4 Launches With Four Sizes, Apache 2.0 License, and a Top-3 Open Model Ranking

Google's Gemma 4 arrives with model sizes from 2B to 31B, a permissive Apache 2.0 license, native multimodal support across all sizes, and the #3 spot on the global open model leaderboard.

Dr. Nova Chen
Dr. Nova ChenApr 4, 20265 min read
AI

Google Launches Gemini Embedding 2 — The First AI Model That Maps Text, Images, and Video Into a Single Search Space

Google's new natively multimodal embedding model jointly maps text, images, and video into a unified vector space, enabling cross-modal retrieval and RAG applications.

Dr. Nova Chen
Dr. Nova ChenMar 14, 20264 min read
AI

OpenAI Is Bringing Sora Video Generation Directly Into ChatGPT — Giving Hundreds of Millions of Users Access

According to The Information, OpenAI plans to embed Sora's video-generation capabilities into the ChatGPT interface, mirroring how DALL-E image creation was integrated.

Dr. Nova Chen
Dr. Nova ChenMar 12, 20264 min read
AI

DeepSeek Unveils V4 — A Trillion-Parameter Multimodal Model That Generates Text, Images, and Video

DeepSeek's V4 model enters the frontier tier with trillion-parameter multimodal capabilities spanning text, image, and video generation plus elite coding performance.

Dr. Nova Chen
Dr. Nova ChenMar 4, 20265 min read
AI

Google Gemini 3.1 Pro Doubles Reasoning Performance With a New Three-Tier Thinking System

Google DeepMind’s Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than doubling its predecessor’s reasoning with a three-tier thinking architecture.

Dr. Nova Chen
Dr. Nova ChenFeb 25, 20265 min read