OpenAI's Sora Redefines Text-to-Video AI Landscape

February, 17, 2024 - 10:47
Space/Science news

TEHRAN (Tasnim) – AI startup OpenAI has introduced Sora, a text-to-video model that could redefine generative AI capabilities, akin to Google's Lumiere tool but with the unique ability to produce videos up to one minute long.

As generative AI progresses, companies like OpenAI, Google, and Microsoft are competing to dominate the sector, expected to reach $1.3 trillion in revenue by 2032, and capture the interest of consumers fascinated by AI innovations since the advent of ChatGPT.

OpenAI plans to make Sora available to "red teamers," experts in misinformation and bias, and creative professionals like visual artists and filmmakers, to gather feedback and test the model's ability to combat deepfakes, a growing concern in AI-generated media.

What sets Sora apart is its capability to understand lengthy prompts, demonstrated by a 135-word example. The model can create diverse characters, scenes, and landscapes, leveraging OpenAI's prior work with Dall-E and GPT models.

Using Dall-E 3's recaptioning technique, Sora can generate complex scenes with multiple characters, motions, and details, understanding prompts and simulating them realistically.

Although Sora's sample videos exhibit impressive realism, some challenges remain, such as accurately depicting complex physics and understanding cause and effect, evident in instances like cookie-biting scenes where the bite mark may be missing.

While OpenAI has not disclosed Sora's wide release date, it emphasizes the need to ensure safety and compliance with existing standards, prohibiting content like extreme violence, sexual content, and IP violations.

OpenAI's continuous efforts to refine AI technologies underscore the importance of real-world applications in enhancing AI safety and efficacy over time.