Jul 3, 2025 3 min read

Character.AI’s Real-Time Video Breakthrough

At Character.AI, we’re excited to introduce TalkingMachines, our newest autoregressive diffusion model that enables real-time, audio-driven, FaceTime-style video generation.

With just an image and a voice signal, the model can generate an interactive, real-time video of characters conversing across different styles, genres, and identities.

We are constantly building towards the future of entertainment. We started with AvatarFX which powers video generation on our platform today. Now, this research sets the foundation for Character.AI’s future of immersive, real-time AI-powered visual interactions and animated, reactive characters.

Want to see what the future looks like? 📄 Read the full research paper

0:00

/0:52

How It Works

The technology builds on the power of Diffusion Transformer (DiT), using a technique we call asymmetric knowledge distillation to convert a high-quality, bidirectional video model into a blazing-fast, real-time generator.

The model listens to audio and animates a character—mouth, head, eyes—in sync with every word, pause, and intonation. It does so without sacrificing consistency, image quality, style fidelity, or expressiveness.

Here’s how we do it:

Flow-Matched Diffusion: Based on the DiT architecture, our model is pretrained to handle complex motion patterns, from subtle facial expressions to dynamic gestures.
Audio-Driven Cross Attention: A custom-built 1.2B parameter audio module enables the model to learn fine-grained alignment between sound and motion—capturing both speech and silence naturally.
Sparse Causal Attention: Unlike traditional models that rely on expensive bidirectional, dense attention, our autoregressive design only looks at the most relevant past frames, reducing memory and latency without compromising quality.
Asymmetric Distillation: Using our modified CausVid approach, we train a fast, 2-step diffusion model to imitate a slow, high-quality teacher—achieving infinite-length generation with no quality degradation over time.

Why It Matters

The research breakthrough isn’t just about facial animation. It’s a foundational step towards interactive audiovisual AI characters. It brings us closer to a future where you can interact with characters in real time.

This means:

Supporting a wide range of styles, from photorealistic humans to anime and 3D avatars
Enabling streaming with natural listening and speaking phases
Building the core infrastructure for role-play, storytelling, and interactive world-building

Pushing the Frontier

This research advances the state of the art in several ways:

✅ Real-time generation: No more pre-rendered video snippets—this system generates everything live, frame by frame

✅ Efficient distillation: Just two diffusion steps are needed for generation, with no perceptual loss

✅ High scalability: The system runs in real time on just two GPUs, thanks to deep systems-level optimization

✅ Multispeaker support: Our speaking/silence detection mechanism allows seamless turn-taking across characters

And this is just the start.

0:00

/1:19

Built for the Future

We are actively working on bringing this research into the Character.AI platform, where it will one day power FaceTime-like experiences, character streaming, and visual world-building.

While this is not a product launch (yet), it marks an important milestone in our research roadmap. Our long-term goal is to make it possible for anyone to build and interact with immersive audiovisual characters.

From Research to Reality

We’ve invested deeply in training infrastructure, distillation methods, and system design to make this research a reality. Our research team trained this model using:

Over 1.5 million curated video clips
A three-stage training pipeline leveraging around 256 H100s
Custom deployment optimizations including CUDA stream overlap, KV caching, and VAE-decoder disaggregation

This is what frontier research looks like when applied with precision and purpose.

Learn More

📄 Read the full paper: arxiv.org/pdf/2506.03099

🎥 Watch sample demos: aaxwaz.github.io/TalkingMachines

Want to be part of shaping the future of AI research at Character.AI? Reach out—we’re just getting started.