How to Create Free Lip-Sync Videos with OpenClaw: Step-by-Step Guide

Are you looking to create realistic talking avatars and lip-sync videos without breaking the bank? OpenClaw, the open-source autonomous AI agent framework, has made this possible through its powerful ecosystem. By using the Flyworks Avatar Video Skill available on ClawHub, you can transform photos into talking videos and even clone your voice entirely for free!

In this tutorial, we will walk you through the entire process of setting up and utilizing OpenClaw to create amazing lip-sync videos.

Understanding the Workflow

Creating a lip-sync video essentially requires three crucial components: an AI agent (OpenClaw), an Avatar/Video Generation Skill, and your creativity.

Workflow Infographic

The Flyworks Avatar Video skill brings powerful capabilities directly into your agent:

Talking Photos: Instantly turn any static image into a talking video.
Public Avatars: Utilize highly realistic pre-made avatars with advanced Text-to-Speech (TTS).
Voice Cloning: Clone a specific voice from a short audio sample.

Let's dive into the setup!

Step 1: Installing the Skill

First, you need to install the Flyworks Avatar Video skill to your agent environment. ClawHub makes this incredibly easy with the skills CLI.

Terminal Installation

Open your terminal and run the following command to add the skill:

# Install globally
npx skills add Flyworks-AI/skills -g

Note: You can use this skill alongside Claude Code, Cursor, Codex, and other supported AI agents.

Next, install the Python dependencies needed to interact with the video generation API:

pip install -r requirements.txt

Try it out with the Demo Token

By default, the skill comes with a free-tier demo token. Note that the demo token will apply a watermark to your videos and limit them to a maximum duration of 30 seconds. To remove these limitations, you can register for your own API key at flyworks.ai/setting and set it via export HIFLY_API_TOKEN="your_token_here".

Step 2: Creating a Talking Photo (Lip-Syncing)

The "Talking Photo" feature is where the magic happens! You can take a still photograph of yourself or a character and supply an audio or text script. The AI will analyze the image and animate the mouth to perfectly lip-sync with your audio.

Talking Photo Demo

You can ask OpenClaw to perform this directly using a natural language prompt:

"Create a talking photo video from my photo saying 'Welcome to our service'"

Or use the provided client script directly:

# Prepare the talking photo
python scripts/hifly_client.py create_talking_photo \
    --image assets/my_photo.png \
    --title "My Avatar"

This command gives you a custom Avatar ID which you can then save to memory and reuse for any future videos!

Step 3: Giving Your Avatar a Voice

A lip-sync video is only as good as the voice behind it! While the skill offers many public TTS voices out-of-the-box (list_public_voices), you might want something truly unique—like your own voice.

Cloning a Custom Voice

Voice Cloning Illustration

You can clone a voice simply by providing a sample audio file. Again, instruct your agent:

"Clone my voice from this audio file and generate a greeting video using my custom avatar."

Under the hood, this executes the cloning process:

python scripts/hifly_client.py clone_voice \
    --audio assets/my_voice_sample.MP3 \
    --title "My Cloned Voice"

Step 4: Generate the Final Lip-Sync Video

Now that you have your avatar (the "talking photo") and your voice sorted, you simply put them together.

Run the creation command, passing in the text, your custom avatar ID, and your chosen voice:

python scripts/hifly_client.py create_video \
    --type tts \
    --text "Hello everyone! This entire lip-sync video was generated for free using OpenClaw and the Flyworks Avatar Video skill. Pretty cool, right?" \
    --avatar my_custom_avatar_id \
    --voice my_cloned_voice_id

The script manages the video generation workflow behind the scenes. Wait a few moments, and the final animated MP4 video with perfect lip-sync will be successfully generated!

Conclusion

Creating impressive, high-quality talking digital avatars has never been easier or more accessible. By combining the OpenClaw AI agent framework with the free Flyworks Avatar Video skill, developers and creators can now automate the production of lip-sync content effortlessly.

Try exploring the available ClawHub Skills here to see what other amazing capabilities you can unlock!