BIP Denver

collapse
Home / Daily News Analysis / Google’s new anything-to-anything AI model is wild

Google’s new anything-to-anything AI model is wild

May 26, 2026  Twila Rosenbaum  3 views
Google’s new anything-to-anything AI model is wild

Google has unveiled a new family of generative AI models called Omni, with the first release being Omni Flash. The model is designed to accept any kind of input—photos, videos, text—and generate any kind of output, though for now it is primarily focused on video creation. Available through Google’s Flow platform, Omni Flash succeeds the previous Veo model and marks a significant step forward in AI video generation.

Background: The evolution of generative video AI

Generative AI video tools have advanced rapidly over the past year. OpenAI’s Sora, Meta’s Make-A-Video, and Google’s own Veo have all pushed the boundaries of what AI can create from simple prompts. However, these models have been plagued by inconsistencies: objects morphing, characters changing appearance, and unnatural movements. Google’s Omni aims to address these issues by incorporating more real-world knowledge and improved character consistency.

The Omni model was announced at Google I/O 2026, alongside other updates to the Gemini ecosystem. It represents a shift from text-to-video to a more flexible input-output paradigm. Users can now upload an existing video and use text instructions to modify it, create a new video from a combination of sources, or even generate videos from still images. This “anything-to-anything” approach is ambitious and aligns with Google’s broader push toward universal AI assistants.

Key features of Omni Flash

Omni Flash offers several improvements over Veo. First, it accepts video as input alongside text prompts, allowing users to build upon existing clips. Second, Google claims the model has better real-world knowledge, meaning it understands how objects move, how light behaves, and how scenes should transition. Third, character consistency is supposedly enhanced—a person or animal should look the same from one frame to the next, even as the scene changes.

Another notable feature is the ability to make text-based edits to videos after they are generated. Users can instruct the model to change an object’s color, add a new element, or alter the mood of a scene. This is a step beyond what Veo could do, where editing was often more cumbersome than creating from scratch.

Testing the model: A stuffed deer’s adventures

To evaluate Omni Flash, a series of tests were conducted using a plush deer toy named Buddy. The same toy had been used to test Veo five months earlier, providing a benchmark for comparison. The first test involved creating a short vacation montage: Buddy packing a suitcase, boarding a cruise ship, and enjoying a tropical destination. The results were mixed.

Some clips showed remarkable improvement. Buddy’s appearance remained consistent across multiple scenes, unlike earlier Veo outputs where his face would morph into different animals. The AI even understood the concept of packing a funny item—a jar of honey—which later appeared as a bottle of sunscreen in a humorous twist. However, the honey bottle kept changing shape and color, from a jar to a clear plastic bottle to a squeeze bottle, revealing the model’s inability to maintain object permanence.

In another test, Buddy was made to skydive. The first few frames were realistic, but then the deer suddenly flipped orientation, landing in an unnatural position. Such “AI jump scares” are common in generative video and highlight the gap between current capabilities and human-level understanding.

Deepfaking the tester: A convincing but unsettling experience

The most striking test involved creating deepfakes of a human subject. Starting with a short selfie video of the author sitting at a table with a neutral expression, Omni Flash was prompted to generate clips of her eating spaghetti, sitting on an airplane, and standing in front of the Eiffel Tower eating a baguette. The results were shockingly realistic.

In the pasta-eating clip, the sound of the fork hitting the bowl was slightly artificial, and the bowl itself looked unfamiliar to the author’s husband, but the movements and facial expressions were convincingly real. The husband, who has seen the author daily for a decade, could not tell the video was AI-generated until being told. Other deepfakes were less perfect—one Eiffel Tower clip showed a cartoonish skyline—but several were good enough to fool social media viewers.

The ease of creating these deepfakes raises serious concerns. With just a few minutes of effort and a credit card, anyone can produce videos that look like real footage of a person in a location they never visited. The model even added realistic background elements, like a woman who appears twice in the airplane clip, which is a common AI artifact but easily missed by casual viewers.

Cost and accessibility

Using Omni Flash is not free. Google charges credits for each video generation, ranging from 15 to 40 credits depending on length and complexity. Editing a video costs 40 credits each time. The $20-per-month AI Pro plan includes 1,000 credits, which can be consumed quickly. After generating about 20 clips with a few edits, the author was down to 145 credits. This pricing model means that achieving a polished result may require many expensive iterations, limiting access for casual users.

Broader implications and ethical considerations

The realism of Omni Flash’s output brings both creative opportunities and risks. On the positive side, filmmakers, educators, and marketers can quickly produce high-quality video content. For instance, a small business could create promotional videos without hiring a production crew. However, the same technology can be used to create misleading political propaganda, fake evidence for legal cases, or non-consensual synthetic media.

Google has implemented some safeguards—the model refuses to generate certain types of content, such as explicit violence or sexual material—but it is not foolproof. The deepfake of the author eating pasta was generated without any identity verification, meaning anyone with a photo of a person could potentially create a convincing video of them doing something they never did.

Industry observers note that the line between authentic and AI-generated video is blurring fast. While previous generations of AI video were obviously fake (warping faces, unnatural motion), Omni Flash achieves a level of realism that requires close inspection to detect. This is both a technical achievement and a societal challenge.

Google’s competitors are also advancing. OpenAI’s Sora has demonstrated similar capabilities, and Meta is developing tools to embed invisible watermarks in AI-generated content. Regulation is lagging behind technology, and experts warn that widespread availability of such tools could erode trust in all video evidence.

What’s next for Omni?

Google plans to expand Omni to accept more input types, including audio and 3D models, in the future. The ultimate goal is seamless conversion between any modality—text, image, video, sound, and perhaps even tactile feedback. For now, Omni Flash is a significant but incomplete step. The model still struggles with long-term consistency, object permanence, and natural physics. Yet its ability to create plausible deepfakes from a single selfie indicates how far the technology has come.

The author’s overall impression is one of exhaustion rather than excitement. Each new generation of AI video tools brings impressive leaps, but the ethical and practical challenges compound. Omni Flash is better than Veo, but it is not yet the reliable, affordable, and safe tool that Google envisions. It remains deep in the uncanny valley—realistic enough to deceive, but glitchy enough to betray its artificial nature under scrutiny.


Source: The Verge News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy