Build a repeatable short-form video workflow from the terminal
This guide walks through a Python-based pipeline that can script, generate, voice, assemble, and publish niche videos with a lot less manual work.
Architecture Overview
The pipeline has five stages. Claude Code acts as the orchestrator. It can write, run, and debug the Python scripts that glue the pieces together. You describe your niche once in a CLAUDE.md file, and Claude uses that as the working brief.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ 01 │ │ 02 │ │ 03 │ │ 04 │ │ 05 │
│ SCRIPT │ → │ GENERATE │ → │ VOICE │ → │ COMPOSE │ → │ PUBLISH │
│ Claude │ │ Runway / │ │ Eleven │ │ FFmpeg / │ │ Creato- │
│ API │ │ Kling / │ │ Labs │ │ mate │ │ Platform │
│ │ │ Veo │ │ │ │ │ │ APIs │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Script: Claude generates a niche-specific video script (hook, body, CTA) and a matching image prompt. Generate: A video generation API turns the prompt into a short-form clip. Voice: ElevenLabs converts the script to natural speech. Compose: FFmpeg merges the video, audio, and burns in captions. Publish: A posting API ships the final file to IG Reels, TikTok, and YouTube Shorts.
Prerequisites & Environment Setup
Before you touch any code, make sure the following are installed and ready.
1. Install Claude Code
Claude Code is Anthropic's terminal-based agentic coding tool. It reads your project files, writes scripts, runs them, and iterates until things work. Install globally via npm:
npm install -g @anthropic-ai/claude-code
2. Python 3.10+ & FFmpeg
The entire pipeline runs Python. FFmpeg handles audio/video merging and subtitle burn-in.
# macOS
brew install python ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install python3 python3-pip ffmpeg
3. API Keys
You'll need API keys from the services you pick. At minimum, gather keys for a video generation API, ElevenLabs, and your posting tool. Store them in a .env file — never hardcode them.
# .env
ANTHROPIC_API_KEY=sk-ant-...
RUNWAY_API_KEY=rw-...
ELEVENLABS_API_KEY=sk_...
UPLOAD_POST_API_KEY=up-... # or BLOTATO / LATE key
4. Project Scaffold
Create a project directory with a CLAUDE.md file. This is Claude Code's persistent memory — it reads this file every session so it knows your niche, brand voice, and workflow rules.
mkdir ai-video-pipeline && cd ai-video-pipeline
touch CLAUDE.md .env requirements.txt
CLAUDE.md Tip: Include your niche description, brand tone ("casual, Gen-Z energy" vs. "cinematic, authoritative"), target platforms, and any hashtags or CTA patterns you always want. The more specific you are, the more consistent Claude's output becomes.
Video Generation — Tool Comparison
This is the most impactful choice in the stack. You need an API that produces niche-quality clips from text or image prompts, runs reliably, and doesn't cost a fortune at volume. Here's how the major players stack up in 2026.
Top Video Generation Tools
Runway (✅ Recommended)
- Model: Gen-4.5
- Max Res / Duration: 1080p | 10s per call
- Price: ~$0.05–0.12/sec
- Python SDK:
pip install runwayml
Google Veo (✅ Recommended)
- Model: Veo 3.1
- Max Res / Duration: 4K | ~8s
- Price: Credit-based (Vertex AI)
- Python SDK: Gemini API / Vertex AI SDK
Kling (🟡 Great Value)
- Model: Kling 2.0 / 3.0
- Max Res / Duration: 1080p | Up to 60s
- Price: ~$0.03–0.08/sec
- Python SDK: REST (or via WaveSpeed)
Sora (🟡 If You Have Access)
- Model: Sora 2
- Max Res / Duration: 1080p | ~20s
- Price: Requires ChatGPT Pro
- Python SDK: Limited API access
Pika & Luma (🔵 Budget Picks)
- Models: Pika 2.5 / Luma Ray 3
- Max Res / Duration: 1080p | ~5s
- Price: ~$0.02–0.04/video or $9.99/mo
- Python SDK: REST API
HeyGen (🟡 Talking Head Niche)
- Model: Avatar-based
- Max Res: 1080p
- Price: From $29/mo
- Python SDK: REST API
Leonardo.Ai (🔵 Image → Motion)
- Model: Image-to-Video
- Max Res / Duration: 1080p | ~4s
- Price: $5 free credit, then usage
- Python SDK: REST API
Our Take
For most creators: start with Runway Gen-4.5. It has the best combination of quality, API reliability, a first-party Python SDK, and strong image-to-video capabilities. Generate a hero image with Flux or DALL·E, then animate it with Runway — this gives you the most control over your visual identity.
If budget is tight: Kling via WaveSpeedAI's unified API offers comparable quality at a lower cost. WaveSpeed also lets you switch between Kling, Seedance, and other models through one integration — useful if you want to test different providers without rewriting code.
If you're in the Google ecosystem: Veo 3.1 delivers native audio, vertical video for Shorts, and 4K output. Access it through the Gemini API or Vertex AI SDK.
For talking-head / avatar content: HeyGen lets you generate full presenter-style videos from text scripts. Ideal for educational or news-recap niches where a face on screen is expected.
Example: Generating Video with Runway (Python)
# generate_video.py
from runwayml import RunwayML, TaskFailedError
import os
client = RunwayML(api_key=os.environ["RUNWAY_API_KEY"])
def generate_clip(prompt_text: str, image_url: str = None):
"""Generate a short video clip with Runway Gen-4.5."""
params = {
"model": "gen4.5",
"prompt_text": prompt_text,
"ratio": "1080:1920", # 9:16 for Reels/TikTok/Shorts
"duration": 10,
}
if image_url:
params["prompt_image"] = image_url
try:
task = client.image_to_video.create(**params) \
.wait_for_task_output()
return task.output[0] # video URL
except TaskFailedError as e:
print(f"Generation failed: {e}")
return None
Voiceover & Audio Layer
ElevenLabs is the clear recommendation for voiceover. Their eleven_v3 model produces studio-quality speech in 70+ languages, with emotion control via audio tags like [whispers] and [shouts]. The official Python SDK is mature and well-documented.
Top Voice APIs
ElevenLabs (✅ Recommended)
- Quality: Best-in-class
- Features: ~75ms latency (Flash), Voice Cloning
- Price: From $5/mo
OpenAI TTS
- Quality: Very good
- Features: Fast, no voice cloning
- Price: $15/1M chars
Google Cloud TTS
- Quality: Good
- Features: Multilingual scale
- Price: $4–16/1M chars
Veo 3.1 Native Audio
- Quality: Good (auto-generated SFX/ambient)
- Features: Bundled with video
- Price: Included
Example: Generating Voiceover with ElevenLabs
# generate_voice.py
from elevenlabs.client import ElevenLabs
import os
eleven = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])
def generate_voiceover(script: str, output_path: str = "voiceover.mp3"):
"""Convert script text to speech with ElevenLabs v3."""
audio = eleven.text_to_speech.convert(
text=script,
voice_id="JBFqnCBsd6RMkjVDRZzb", # or your cloned voice
model_id="eleven_v3",
output_format="mp3_44100_128",
)
with open(output_path, "wb") as f:
for chunk in audio:
f.write(chunk)
return output_path
Pro tip: Clone your own voice for brand consistency. Upload 3–5 minutes of clean audio to ElevenLabs, and use the resulting voice ID across all generated content. Your audience won't be able to tell the difference.
Video Composition & Post-Processing
Once you have a raw video clip and a voiceover audio file, you need to merge them and add captions. You have two main approaches.
Option A: FFmpeg (Free, Local)
FFmpeg is the gold standard for programmatic video editing. Claude Code can write and run FFmpeg commands directly. This keeps your pipeline entirely local and free.
# compose.py
import subprocess
def compose_video(video_path, audio_path, srt_path, output_path):
"""Merge video + voiceover + subtitle burn-in."""
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-i", audio_path,
"-vf", f"subtitles={srt_path}:force_style="
"'FontName=Montserrat,FontSize=22,"
"PrimaryColour=&H00FFFFFF,"
"OutlineColour=&H00000000,Outline=2,"
"Alignment=2,MarginV=60'",
"-c:v", "libx264",
"-c:a", "aac",
"-shortest",
output_path,
]
subprocess.run(cmd, check=True)
Option B: Creatomate (API, Cloud-based)
Creatomate is a cloud-based video rendering API. You design a template in their editor (with caption styles, transitions, logo overlays), then send dynamic data via their REST API. It handles the ElevenLabs integration natively — you send text, it generates the voice and subtitles and returns a finished video.
# compose_creatomate.py
import requests, os
def render_video(script: str, images: list[str]):
resp = requests.post(
"https://api.creatomate.com/v2/renders",
headers={
"Authorization": f"Bearer {os.environ['CREATOMATE_KEY']}",
"Content-Type": "application/json",
},
json={
"template_id": "your-template-id",
"modifications": {
"Voiceover": script,
"Image-1": images[0],
},
},
)
return resp.json()[0]["url"]
Composition Approaches
FFmpeg (local)
- Cost: Free
- Captions: SRT burn-in (you generate)
- Use Case: Full control, no recurring cost, runs anywhere
Creatomate
- Cost: From $0.06/render
- Captions: Auto-generated animated
- Use Case: Fast polished output, branded templates, easy REST API
Distribution — Posting to All Platforms
This is where most people get stuck. Each platform has its own API, auth flow, and quirks. Here's the landscape.
Distribution Tools
Upload-Post (✅ Recommended)
- Platforms: TikTok, IG, YT, LinkedIn, X, FB, Pinterest, Threads, Reddit, Bluesky
- Auth: Single API key + dashboard
- Integration:
pip install upload-post
Blotato (🟡 Great for Teams)
- Platforms: 9 platforms incl. TikTok, IG, YT
- Auth: API key + OAuth per platform
- Integration: REST (Claude Code writes wrapper)
Late.dev (🟡 Scheduler Focus)
- Platforms: TikTok, IG, YT, LinkedIn, X
- Auth: API key + OAuth
- Integration: REST API
Direct Platform APIs (🔵 Max Control)
- Platforms: Single platforms (YouTube Data API, IG Graph API, etc.)
- Auth: OAuth 2.0 per platform
- Integration: Various SDKs
tiktok-uploader (OSS)
- Platforms: TikTok only
- Auth: Browser cookies
- Integration:
pip install tiktok-uploader
Our Take
For most creators: Upload-Post gives you one API call to publish across ten platforms simultaneously. It has an official Python SDK and handles all the OAuth complexity behind a simple dashboard. Connect your accounts once, then your script just calls client.upload_video().
Example: Publishing to All Platforms
# publish.py
from upload_post import UploadPostClient
import os
client = UploadPostClient(api_key=os.environ["UPLOAD_POST_API_KEY"])
def publish_everywhere(video_path, title, description, tags):
"""Ship to TikTok, Instagram Reels, and YouTube Shorts."""
response = client.upload_video(
video_path=video_path,
title=title,
description=description,
user="my_brand",
platforms=["tiktok", "instagram", "youtube"],
tags=tags,
)
for platform, result in response["results"].items():
status = "✓" if result["success"] else "✗"
print(f" {status} {platform}: {result.get('url', 'failed')}")
⚠️ Platform format rules: Instagram Reels and TikTok require 9:16 vertical video (1080×1920). YouTube Shorts also requires 9:16 and under 60 seconds. Always generate in portrait mode. If you also want landscape for YouTube long-form, generate a second render at 16:9.
Claude Code Orchestration
This is where it all comes together. Claude Code reads your CLAUDE.md, generates the pipeline script, and runs it.
Step 1: Set Up CLAUDE.md
# AI Video Pipeline
## Niche
Stoic philosophy — short motivational content for men aged 20-35.
## Brand Voice
Authoritative but calm. Cinematic visuals. Deep male narration voice.
Never use slang. End every video with a reflection question.
## Platforms
- Instagram Reels (9:16, under 60s, 3-5 hashtags)
- TikTok (9:16, under 60s, trending sounds optional)
- YouTube Shorts (9:16, under 60s, keyword-rich title)
## Stack
- Video gen: Runway Gen-4.5 (image-to-video)
- Voice: ElevenLabs v3, voice ID: JBFqnCBsd6RMkjVDRZzb
- Composition: FFmpeg with SRT subtitles
- Publishing: Upload-Post API
- All scripts in Python. Use .env for secrets.
Step 2: Ask Claude Code to Build the Pipeline
Open Claude Code in your project directory and give it a high-level instruction. Claude will read your CLAUDE.md, generate all required Python files, install dependencies, and test them.
$ claude
# Claude Code launches, reads CLAUDE.md automatically
> Build the complete video pipeline as described in CLAUDE.md.
> Generate a script about "the stoic response to failure",
> create a matching image prompt, generate the video, add
> voiceover, burn in captions, and publish to all 3 platforms.
> Save each intermediate file so I can review before publishing.
Claude Code will typically generate these files:
scriptwriter.py: Uses the Anthropic API to generate a hook + body + CTA script, plus an image generation prompt.generate_video.py: Calls Runway (or your chosen API) to generate the raw video clip.generate_voice.py: Sends the script to ElevenLabs and saves the MP3 voiceover.compose.py: Merges video + audio + subtitles via FFmpeg.publish.py: Ships the final file to IG, TikTok, and YouTube via your posting API.pipeline.py: Master script that chains all steps — can run end-to-end or with a review gate.
Step 3: Review & Iterate
Claude Code writes the files, runs them, checks for errors, and iterates. If the voiceover doesn't match the video length, it adjusts. If FFmpeg throws a codec error, it fixes the command. You stay in the terminal, describing what you want in natural language.
Recommended Stacks
Based on the comparisons above, here are three stacks depending on your budget and volume.
✅ The Pro Stack — Best Overall (~$1–3 per video)
- Script: Claude API (Sonnet 4)
- Video: Runway Gen-4.5
- Voice: ElevenLabs v3
- Compose: FFmpeg (local)
- Publish: Upload-Post
🟡 The Agency Stack — Hands-Off Premium (~$2–5 per video)
- Script: Claude API (Sonnet 4)
- Video: Runway Gen-4.5
- Voice + Compose: Creatomate (handles both)
- Publish: Blotato
🔵 The Lean Stack — Budget / Bootstrapper (~$0.20–0.80 per video)
- Script: Claude API (Haiku 4.5)
- Video: Kling via WaveSpeed or Pika
- Voice: OpenAI TTS or ElevenLabs free tier
- Compose: FFmpeg (local)
- Publish: tiktok-uploader + YouTube Data API
Scheduling & Running Autonomously
To make this truly autonomous, you need the pipeline to run on a schedule without you at the keyboard.
Option 1: Cron Job (Simplest)
Run your pipeline.py on a cron schedule. This works great for solo creators publishing 1–3 videos per day.
# Run pipeline every day at 8 AM and 5 PM
0 8,17 * * * cd /path/to/ai-video-pipeline && /usr/bin/python3 pipeline.py >> logs/cron.log 2>&1
Option 2: GitHub Actions (Free CI/CD)
Store your pipeline in a GitHub repo and use a scheduled GitHub Actions workflow. This gives you logging, retry logic, and secrets management for free.
# .github/workflows/daily-video.yml
name: Daily Video Pipeline
on:
schedule:
- cron: '0 8,17 * * *' # 8 AM and 5 PM UTC
workflow_dispatch: # manual trigger
jobs:
generate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install -r requirements.txt
- run: sudo apt-get install -y ffmpeg
- run: python pipeline.py
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
RUNWAY_API_KEY: ${{ secrets.RUNWAY_API_KEY }}
ELEVENLABS_API_KEY: ${{ secrets.ELEVENLABS_API_KEY }}
UPLOAD_POST_API_KEY: ${{ secrets.UPLOAD_POST_API_KEY }}
Option 3: Human-in-the-Loop Review
For maximum safety, add a review gate. The pipeline generates the video and saves it, then sends you a notification (email, Slack, or SMS). You approve it, and only then does it publish. Claude Code can set this up using webhook triggers or a simple Flask approval endpoint.
The 80/20 Rule: Automate 80% of the work — research, scripting, generation, compositing. Keep 20% human — review, community engagement, and high-stakes creative decisions. This prevents the "bot look" that tanks engagement and risks shadowbans.
Gotchas & Platform Rules
The Instagram Graph API requires a Facebook Business account and app review. Using a unified posting API like Upload-Post or Blotato bypasses most of this complexity since they've already completed the review process. Reels must be 9:16, 3–90 seconds, and under 1 GB.
TikTok
TikTok's Content Posting API is only available to approved partners. For individual creators, cookie-based uploaders (like tiktok-uploader) work but may break when TikTok changes their frontend. Unified APIs abstract this away. Keep videos under 10 minutes and 4 GB.
YouTube
The YouTube Data API v3 has a daily quota of 10,000 units. Each video upload costs 1,600 units — so you can upload about 6 videos per day before hitting the cap. Shorts must be vertical (9:16), under 60 seconds, and should include #Shorts in the title or description.
General Tips
- Never hardcode credentials. Always use
.envfiles andpython-dotenv. Claude Code will set this up correctly if you ask. - Vary your content. Posting identical or near-identical videos across platforms can trigger spam detection. Use slightly different captions, hashtags, and descriptions per platform.
- Respect rate limits. Add retry logic with exponential backoff to all API calls.
- Log everything. Save all intermediate files (script JSON, raw video, audio, final video) and publish results. When something fails at 3 AM, logs are all you have.
⚠️ Disclaimer: AI-generated content policies are evolving rapidly. Some platforms now require disclosure of AI-generated content. Always check the latest terms of service for Instagram, TikTok, and YouTube before publishing. This guide provides the technical infrastructure — you're responsible for compliance with platform policies.
Quick Links
- Runway API Docs
- ElevenLabs Developer Docs
- Upload-Post Python SDK
- Creatomate API
- FFmpeg Documentation
- Claude Code
- tiktok-uploader on PyPI
- YouTube Data API v3
Built with Claude Code · Published on troystechcorner.com · March 2026
Frequently Asked Questions
Can Claude Code build the whole AI video pipeline for me?
It can help a lot with scaffolding, scripting, debugging, and orchestration, but you still need to choose services, manage keys, test outputs, and review the automation.
What is the hardest part of an autonomous video pipeline?
Usually reliability across services. Generation quality, API failures, rate limits, subtitle timing, and platform publishing rules are where the workflow gets fragile.
Should beginners automate publishing on day one?
Usually no. It is smarter to prove that scripting, generation, voice, and composition work reliably before letting the pipeline publish automatically.
