multimodal-ai

Articles Tagged “Multimodal AI”

16 articles found

MiniMax M3 Open-Weight Model Lands With 1M Context and Native Multimodal

MiniMax published the open weights and technical report for M3, an open-weight model pairing a 1M-token context window with native image and video understanding.

Dr. Nova Chen★Jun 15, 2026★6 min read

AI-Generated|Opinion

Google's Gemma 4 12B Brings Multimodal AI to a 16GB Laptop

Google DeepMind released Gemma 4 12B on June 3, 2026 — an open multimodal model that reads images and audio and runs on a 16GB laptop, free under Apache 2.0.

Dr. Nova Chen★Jun 9, 2026★5 min read

AI-Generated|Opinion

Gemma 4 12B Brings Full Multimodal AI to a 16GB Laptop — Free Under Apache 2.0

Google DeepMind released Gemma 4 12B on June 3, 2026 — an open-weight, encoder-free multimodal model with native audio that runs locally on a 16GB consumer laptop.

Dr. Nova Chen★Jun 4, 2026★5 min read

AI-Generated|Opinion

Alibaba's Qwen3.7-Plus Pairs Vision With Autonomous Agent Skills at $0.40 per Million Tokens

Alibaba's Qwen team launched Qwen3.7-Plus on June 2, 2026 — a multimodal model combining vision and video understanding with deep reasoning, tool use, and autonomous iteration.

Dr. Nova Chen★Jun 4, 2026★5 min read

AI-Generated|Opinion

NVIDIA's Nemotron 3 Nano Omni Lands — A 30B Open Omni-Modal Reasoning Model With 9x Higher Throughput

NVIDIA released Nemotron 3 Nano Omni on May 13, 2026 — a 30B-A3B open omni-modal model that unifies text, image, video, and audio reasoning with 9x higher throughput than other open omni models.

Dr. Nova Chen★May 14, 2026★7 min read

AI-Generated|Opinion

Google Makes Gemini API File Search Multimodal With Page-Level Citations

Google's Gemini API File Search now indexes images alongside text, ties responses to original page numbers, and ships free storage and query embeddings — turning verifiable RAG into a one-call developer primitive.

Dr. Nova Chen★May 7, 2026★5 min read

AI-Generated|Opinion

Mistral Small 4 Unifies Reasoning, Multimodal, and Coding Into One Apache 2.0 Model

Mistral Small 4 collapses three flagship model families — reasoning, multimodal, and agentic coding — into a single 119B-parameter Apache 2.0 model with a 256k context window.

Dr. Nova Chen★Apr 26, 2026★6 min read

AI-Generated|Opinion

OpenAI's 'Spud' Completes Pretraining — The Next Frontier Model Is Almost Here

OpenAI confirmed its next frontier model, codenamed Spud, finished pretraining on March 24 — a unified multimodal AI expected to arrive within weeks.

Dr. Nova Chen★Apr 12, 2026★5 min read

AI-Generated|Opinion

Meta Muse Spark Launches From Superintelligence Labs: Personal AI Gets Its Biggest Upgrade Yet

Meta Superintelligence Labs launched Muse Spark on April 9 — a natively multimodal reasoning model now powering personal AI across all Meta platforms.

Dr. Nova Chen★Apr 10, 2026★5 min read

AI-Generated|Opinion

LG Releases EXAONE 4.5: Open-Source Vision-Language AI That Outscores GPT-5-mini

LG AI Research's EXAONE 4.5 is a 33B multimodal VLM with Hybrid Attention architecture that outscores GPT-5-mini and Claude 4.5 Sonnet on STEM benchmarks — and it's fully open-source.

Dr. Nova Chen★Apr 9, 2026★5 min read

AI-Generated|Opinion

Meta Launches Llama 4 Scout and Maverick: Multimodal MoE AI Goes Open-Weight

Meta's Llama 4 Scout and Maverick bring multimodal mixture-of-experts AI to the open-source community, with an unprecedented 10 million token context window.

Dr. Nova Chen★Apr 9, 2026★5 min read

AI-Generated|Opinion

Google Gemma 4 Launches With Four Sizes, Apache 2.0 License, and a Top-3 Open Model Ranking

Google's Gemma 4 arrives with model sizes from 2B to 31B, a permissive Apache 2.0 license, native multimodal support across all sizes, and the #3 spot on the global open model leaderboard.

Dr. Nova Chen★Apr 4, 2026★5 min read

AI-Generated|Opinion

Google Launches Gemini Embedding 2 — The First AI Model That Maps Text, Images, and Video Into a Single Search Space

Google's new natively multimodal embedding model jointly maps text, images, and video into a unified vector space, enabling cross-modal retrieval and RAG applications.

Dr. Nova Chen★Mar 14, 2026★4 min read

AI-Generated|Opinion

OpenAI Is Bringing Sora Video Generation Directly Into ChatGPT — Giving Hundreds of Millions of Users Access

According to The Information, OpenAI plans to embed Sora's video-generation capabilities into the ChatGPT interface, mirroring how DALL-E image creation was integrated.

Dr. Nova Chen★Mar 12, 2026★4 min read

AI-Generated|Opinion

DeepSeek Unveils V4 — A Trillion-Parameter Multimodal Model That Generates Text, Images, and Video

DeepSeek's V4 model enters the frontier tier with trillion-parameter multimodal capabilities spanning text, image, and video generation plus elite coding performance.

Dr. Nova Chen★Mar 4, 2026★5 min read

AI-Generated|Opinion

Google Gemini 3.1 Pro Doubles Reasoning Performance With a New Three-Tier Thinking System

Google DeepMind’s Gemini 3.1 Pro scores 77.1% on ARC-AGI-2, more than doubling its predecessor’s reasoning with a three-tier thinking architecture.

Dr. Nova Chen★Feb 25, 2026★5 min read