Google just made video creation as simple as having a conversation.
At Google I/O 2026 on May 19, the company launched Gemini Omni Flash a new AI video model that can generate and edit videos from any combination of text, images, audio, and existing footage. No timeline. No editing software. No experience required. You just describe what you want, and the model builds it.
The headline feature that’s already going viral: a personal digital avatar tool that lets you create a video version of yourself that looks and sounds like you and then drop it into any scene you generate.
WHAT IS GEMINI OMNI?
Gemini Omni is Google’s new multimodal video model family from Google DeepMind, announced by Koray Kavukcuoglu, Google’s CTO and Chief AI Architect. The “Omni” name reflects what makes it different from previous AI video tools: it handles all input types text, image, audio, and video in a single model, with shared context across every modality.
The first model in the family, Gemini Omni Flash, started rolling out on May 19, 2026, directly to the Gemini app, Google Flow, and YouTube Shorts.
HOW CONVERSATIONAL EDITING WORKS
Before Omni, every AI video tool worked the same way: you wrote a prompt, got a clip, and if you wanted to change anything, you started over from scratch. Runway, Pika, Sora, Veo all of them reset each time.
Omni changes that fundamentally. Every instruction you give stacks on the previous one. Characters stay consistent across edits. Scene context carries over between turns. Physics holds up. If you ask for a camera angle change three prompts in, the model still remembers your original characters, setting, and tone.
This is what Google means by “conversational editing” it’s not a feature, it’s the entire product model.
PERSONAL DIGITAL AVATARS
One of the most talked-about features from the @GeminiApp announcement is the avatar tool.
Here’s how it works: you record a short clip of yourself following on-screen prompts turning your head, saying a number sequence aloud. That recording is used to create a digital avatar that looks and sounds like you. From that point, you can generate video content starring your avatar and drop yourself into any scene Omni creates.
Google built in specific friction by design: you have to record yourself in real time, which limits misuse. You cannot generate a video of another real person only a verified avatar of yourself.
Every avatar-generated video is tagged with Google’s invisible SynthID digital watermark plus C2PA Content Credentials, verifiable through the Gemini app, Gemini in Chrome, and Google Search.
WHO CAN ACCESS IT?
Gemini app and Google Flow: Available now for Google AI Plus, Pro, and Ultra subscribers.
YouTube Shorts and YouTube Create app: Free access, no subscription required.
API access for developers and enterprise: Coming in the following weeks.
Clips are currently capped at 10 seconds in the Flash tier. Longer clip support under paid tiers has not yet been announced publicly.
WHAT INPUTS DOES OMNI ACCEPT?
Gemini Omni Flash accepts:
– Text prompts
– Reference images (for style, character, or environment)
– Existing video clips (to edit or extend)
– Audio inputs
– Drawings
You can mix and match these freely. A creator can upload a reference photo, describe a setting in text, and then refine the result across multiple turns all in one continuous session.
WHAT IT CANNOT DO YET
One limitation worth knowing: audio output right now is voice-only. Custom music and sound effects generation is not available yet, only spoken narration. Google has confirmed that broader audio and speech editing is still being tested and is “coming in future updates.”
Firebase, image, and extended audio output modalities are all listed as coming soon.
WHAT MAKES IT DIFFERENT FROM SORA AND RUNWAY?
The key difference is the editing model itself. Sora generates clips it doesn’t maintain a coherent memory of previous instructions. Runway requires you to re-prompt or use separate editing tools.
Omni’s multi-turn memory is the part that genuinely doesn’t exist elsewhere at this scale. Combined with Google’s existing advantages Gemini’s knowledge base, YouTube integration, and SynthID watermarking infrastructure Omni has a foundation that standalone AI video startups simply don’t have access to.
DEEPFAKE SAFEGUARDS
Google is leaning heavily on responsible AI framing here. Every Omni output carries an invisible SynthID watermark and C2PA Content Credentials. The avatar system requires real-time self-recording, specifically to prevent third-party deepfakes. Policies explicitly prohibit generating video of real people who haven’t created their own verified avatar.
BOTTOM LINE
Gemini Omni Flash is the most significant shift in AI video creation since text-to-video tools first launched. Conversational editing where every instruction builds on the last without losing context is a genuinely new workflow. The digital avatar feature will drive enormous social media adoption. And the fact that YouTube Shorts users get free access means this will reach hundreds of millions of people within weeks.
If you create video content in any form marketing, social media, filmmaking, education Gemini Omni is worth your time this week.
