This Text to Video Lip Sync example starts with the same natural talking clip as the uploaded-audio tutorial, but uses a different input strategy. Instead of supplying the final line as audio, we clone a voice from a short reference sample and replace the spoken line with text.
Source clip
Here is the source video used for this tutorial.
Source video requirements
For text-driven rewrites, the source clip should still feel like a natural talking performance.
- A natural speaking clip works best, and it does not need to contain the final line you want in the finished video
- Around 5 seconds to a few minutes is fine
- The face should stay visible for most of the shot
- Small, believable motion is good
- Avoid exaggerated gestures, sudden turns, or large action beats
If you are filming a clip specifically for this workflow, keep it simple and natural. Saying "one two three" during recording is enough. That is one of the main strengths of this tool: you do not need to memorize or perform the final script on camera to get a believable talking video later.
Voice reference
This is the voice reference used to clone the speaking style:
Voice reference input
Replacement script
This is the exact script used for the rewritten version:
That's it for today! Isn't this incredible? It's free, it's fast, it's u* — and you can even generate lip sync videos up to 60 minutes long.
This path is useful when:
- You want to revise the line without recording new audio
- You need to localize or patch one sentence quickly
- You want the new speech to follow a chosen voice identity
Generated result
Here is the finished result generated from the source clip, voice reference, and replacement script:
Open the dedicated watch page for this result
What this tutorial shows
- One natural source clip can be reused for a fully new spoken line
- A short voice reference is enough to define the vocal identity
- Text rewrites are strongest when the source motion stays calm and believable
How to recreate this workflow
- Open Text to Video Lip Sync.
- Upload a natural talking video with one clear face.
- Upload a short voice reference for cloning.
- Paste the new script you want the video to say.
- Generate the result and check whether the new spoken line feels believable in the original shot.



