Build an Autonomous AI Video Pipeline with Claude Code

Build a repeatable short-form video workflow from the terminal

This guide walks through a Python-based pipeline that can script, generate, voice, assemble, and publish niche videos with a lot less manual work.

Architecture Overview

The pipeline has five stages. Claude Code acts as the orchestrator. It can write, run, and debug the Python scripts that glue the pieces together. You describe your niche once in a CLAUDE.md file, and Claude uses that as the working brief.

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ 01       │    │ 02       │    │ 03       │    │ 04       │    │ 05       │
│ SCRIPT   │ →  │ GENERATE │ →  │ VOICE    │ →  │ COMPOSE  │ →  │ PUBLISH  │
│ Claude   │    │ Runway / │    │ Eleven   │    │ FFmpeg / │    │ Creato-  │
│ API      │    │ Kling /  │    │ Labs     │    │ mate     │    │ Platform │
│          │    │ Veo      │    │          │    │          │    │ APIs     │
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘

Script: Claude generates a niche-specific video script (hook, body, CTA) and a matching image prompt. Generate: A video generation API turns the prompt into a short-form clip. Voice: ElevenLabs converts the script to natural speech. Compose: FFmpeg merges the video, audio, and burns in captions. Publish: A posting API ships the final file to IG Reels, TikTok, and YouTube Shorts.

Prerequisites & Environment Setup

Before you touch any code, make sure the following are installed and ready.

1. Install Claude Code

Claude Code is Anthropic's terminal-based agentic coding tool. It reads your project files, writes scripts, runs them, and iterates until things work. Install globally via npm:

npm install -g @anthropic-ai/claude-code

2. Python 3.10+ & FFmpeg

The entire pipeline runs Python. FFmpeg handles audio/video merging and subtitle burn-in.

# macOS
brew install python ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install python3 python3-pip ffmpeg

3. API Keys

You'll need API keys from the services you pick. At minimum, gather keys for a video generation API, ElevenLabs, and your posting tool. Store them in a .env file — never hardcode them.

# .env
ANTHROPIC_API_KEY=sk-ant-...
RUNWAY_API_KEY=rw-...
ELEVENLABS_API_KEY=sk_...
UPLOAD_POST_API_KEY=up-...   # or BLOTATO / LATE key

4. Project Scaffold

Create a project directory with a CLAUDE.md file. This is Claude Code's persistent memory — it reads this file every session so it knows your niche, brand voice, and workflow rules.

mkdir ai-video-pipeline && cd ai-video-pipeline
touch CLAUDE.md .env requirements.txt

CLAUDE.md Tip: Include your niche description, brand tone ("casual, Gen-Z energy" vs. "cinematic, authoritative"), target platforms, and any hashtags or CTA patterns you always want. The more specific you are, the more consistent Claude's output becomes.

Video Generation — Tool Comparison

This is the most impactful choice in the stack. You need an API that produces niche-quality clips from text or image prompts, runs reliably, and doesn't cost a fortune at volume. Here's how the major players stack up in 2026.

Our Take

For most creators: start with Runway Gen-4.5. It has the best combination of quality, API reliability, a first-party Python SDK, and strong image-to-video capabilities. Generate a hero image with Flux or DALL·E, then animate it with Runway — this gives you the most control over your visual identity.

If budget is tight: Kling via WaveSpeedAI's unified API offers comparable quality at a lower cost. WaveSpeed also lets you switch between Kling, Seedance, and other models through one integration — useful if you want to test different providers without rewriting code.

If you're in the Google ecosystem: Veo 3.1 delivers native audio, vertical video for Shorts, and 4K output. Access it through the Gemini API or Vertex AI SDK.

For talking-head / avatar content: HeyGen lets you generate full presenter-style videos from text scripts. Ideal for educational or news-recap niches where a face on screen is expected.

Example: Generating Video with Runway (Python)

# generate_video.py
from runwayml import RunwayML, TaskFailedError
import os

client = RunwayML(api_key=os.environ["RUNWAY_API_KEY"])

def generate_clip(prompt_text: str, image_url: str = None):
    """Generate a short video clip with Runway Gen-4.5."""
    params = {
        "model": "gen4.5",
        "prompt_text": prompt_text,
        "ratio": "1080:1920",   # 9:16 for Reels/TikTok/Shorts
        "duration": 10,
    }
    if image_url:
        params["prompt_image"] = image_url

    try:
        task = client.image_to_video.create(**params) \
            .wait_for_task_output()
        return task.output[0]  # video URL
    except TaskFailedError as e:
        print(f"Generation failed: {e}")
        return None

Voiceover & Audio Layer

ElevenLabs is the clear recommendation for voiceover. Their eleven_v3 model produces studio-quality speech in 70+ languages, with emotion control via audio tags like [whispers] and [shouts]. The official Python SDK is mature and well-documented.

Top Voice APIs

ElevenLabs (✅ Recommended)

Quality: Best-in-class
Features: ~75ms latency (Flash), Voice Cloning
Price: From $5/mo

OpenAI TTS

Quality: Very good
Features: Fast, no voice cloning
Price: $15/1M chars

Google Cloud TTS

Quality: Good
Features: Multilingual scale
Price: $4–16/1M chars

Veo 3.1 Native Audio

Quality: Good (auto-generated SFX/ambient)
Features: Bundled with video
Price: Included

Example: Generating Voiceover with ElevenLabs

# generate_voice.py
from elevenlabs.client import ElevenLabs
import os

eleven = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

def generate_voiceover(script: str, output_path: str = "voiceover.mp3"):
    """Convert script text to speech with ElevenLabs v3."""
    audio = eleven.text_to_speech.convert(
        text=script,
        voice_id="JBFqnCBsd6RMkjVDRZzb",  # or your cloned voice
        model_id="eleven_v3",
        output_format="mp3_44100_128",
    )
    with open(output_path, "wb") as f:
        for chunk in audio:
            f.write(chunk)
    return output_path

Pro tip: Clone your own voice for brand consistency. Upload 3–5 minutes of clean audio to ElevenLabs, and use the resulting voice ID across all generated content. Your audience won't be able to tell the difference.

Video Composition & Post-Processing

Once you have a raw video clip and a voiceover audio file, you need to merge them and add captions. You have two main approaches.

Option A: FFmpeg (Free, Local)

FFmpeg is the gold standard for programmatic video editing. Claude Code can write and run FFmpeg commands directly. This keeps your pipeline entirely local and free.

# compose.py
import subprocess

def compose_video(video_path, audio_path, srt_path, output_path):
    """Merge video + voiceover + subtitle burn-in."""
    cmd = [
        "ffmpeg", "-y",
        "-i", video_path,
        "-i", audio_path,
        "-vf", f"subtitles={srt_path}:force_style="
               "'FontName=Montserrat,FontSize=22,"
               "PrimaryColour=&H00FFFFFF,"
               "OutlineColour=&H00000000,Outline=2,"
               "Alignment=2,MarginV=60'",
        "-c:v", "libx264",
        "-c:a", "aac",
        "-shortest",
        output_path,
    ]
    subprocess.run(cmd, check=True)

Option B: Creatomate (API, Cloud-based)

Creatomate is a cloud-based video rendering API. You design a template in their editor (with caption styles, transitions, logo overlays), then send dynamic data via their REST API. It handles the ElevenLabs integration natively — you send text, it generates the voice and subtitles and returns a finished video.

# compose_creatomate.py
import requests, os

def render_video(script: str, images: list[str]):
    resp = requests.post(
        "https://api.creatomate.com/v2/renders",
        headers={
            "Authorization": f"Bearer {os.environ['CREATOMATE_KEY']}",
            "Content-Type": "application/json",
        },
        json={
            "template_id": "your-template-id",
            "modifications": {
                "Voiceover": script,
                "Image-1": images[0],
            },
        },
    )
    return resp.json()[0]["url"]

Composition Approaches

FFmpeg (local)

Cost: Free
Captions: SRT burn-in (you generate)
Use Case: Full control, no recurring cost, runs anywhere

Creatomate

Cost: From $0.06/render
Captions: Auto-generated animated
Use Case: Fast polished output, branded templates, easy REST API

Distribution — Posting to All Platforms

This is where most people get stuck. Each platform has its own API, auth flow, and quirks. Here's the landscape.

Distribution Tools

Upload-Post (✅ Recommended)

Platforms: TikTok, IG, YT, LinkedIn, X, FB, Pinterest, Threads, Reddit, Bluesky
Auth: Single API key + dashboard
Integration: pip install upload-post

Blotato (🟡 Great for Teams)

Platforms: 9 platforms incl. TikTok, IG, YT
Auth: API key + OAuth per platform
Integration: REST (Claude Code writes wrapper)

Late.dev (🟡 Scheduler Focus)

Platforms: TikTok, IG, YT, LinkedIn, X
Auth: API key + OAuth
Integration: REST API

Direct Platform APIs (🔵 Max Control)

Platforms: Single platforms (YouTube Data API, IG Graph API, etc.)
Auth: OAuth 2.0 per platform
Integration: Various SDKs

tiktok-uploader (OSS)

Platforms: TikTok only
Auth: Browser cookies
Integration: pip install tiktok-uploader

Our Take

For most creators: Upload-Post gives you one API call to publish across ten platforms simultaneously. It has an official Python SDK and handles all the OAuth complexity behind a simple dashboard. Connect your accounts once, then your script just calls client.upload_video().

Example: Publishing to All Platforms

# publish.py
from upload_post import UploadPostClient
import os

client = UploadPostClient(api_key=os.environ["UPLOAD_POST_API_KEY"])

def publish_everywhere(video_path, title, description, tags):
    """Ship to TikTok, Instagram Reels, and YouTube Shorts."""
    response = client.upload_video(
        video_path=video_path,
        title=title,
        description=description,
        user="my_brand",
        platforms=["tiktok", "instagram", "youtube"],
        tags=tags,
    )
    for platform, result in response["results"].items():
        status = "✓" if result["success"] else "✗"
        print(f"  {status} {platform}: {result.get('url', 'failed')}")

⚠️ Platform format rules: Instagram Reels and TikTok require 9:16 vertical video (1080×1920). YouTube Shorts also requires 9:16 and under 60 seconds. Always generate in portrait mode. If you also want landscape for YouTube long-form, generate a second render at 16:9.

Claude Code Orchestration

This is where it all comes together. Claude Code reads your CLAUDE.md, generates the pipeline script, and runs it.

Step 1: Set Up CLAUDE.md

# AI Video Pipeline

## Niche
Stoic philosophy — short motivational content for men aged 20-35.

## Brand Voice
Authoritative but calm. Cinematic visuals. Deep male narration voice.
Never use slang. End every video with a reflection question.

## Platforms
- Instagram Reels (9:16, under 60s, 3-5 hashtags)
- TikTok (9:16, under 60s, trending sounds optional)
- YouTube Shorts (9:16, under 60s, keyword-rich title)

## Stack
- Video gen: Runway Gen-4.5 (image-to-video)
- Voice: ElevenLabs v3, voice ID: JBFqnCBsd6RMkjVDRZzb
- Composition: FFmpeg with SRT subtitles
- Publishing: Upload-Post API
- All scripts in Python. Use .env for secrets.

Step 2: Ask Claude Code to Build the Pipeline

Open Claude Code in your project directory and give it a high-level instruction. Claude will read your CLAUDE.md, generate all required Python files, install dependencies, and test them.

$ claude

# Claude Code launches, reads CLAUDE.md automatically

> Build the complete video pipeline as described in CLAUDE.md.
> Generate a script about "the stoic response to failure",
> create a matching image prompt, generate the video, add
> voiceover, burn in captions, and publish to all 3 platforms.
> Save each intermediate file so I can review before publishing.

Claude Code will typically generate these files:

scriptwriter.py: Uses the Anthropic API to generate a hook + body + CTA script, plus an image generation prompt.
generate_video.py: Calls Runway (or your chosen API) to generate the raw video clip.
generate_voice.py: Sends the script to ElevenLabs and saves the MP3 voiceover.
compose.py: Merges video + audio + subtitles via FFmpeg.
publish.py: Ships the final file to IG, TikTok, and YouTube via your posting API.
pipeline.py: Master script that chains all steps — can run end-to-end or with a review gate.

Step 3: Review & Iterate

Claude Code writes the files, runs them, checks for errors, and iterates. If the voiceover doesn't match the video length, it adjusts. If FFmpeg throws a codec error, it fixes the command. You stay in the terminal, describing what you want in natural language.

Recommended Stacks

Based on the comparisons above, here are three stacks depending on your budget and volume.

✅ The Pro Stack — Best Overall (~$1–3 per video)

Script: Claude API (Sonnet 4)
Video: Runway Gen-4.5
Voice: ElevenLabs v3
Compose: FFmpeg (local)
Publish: Upload-Post

🟡 The Agency Stack — Hands-Off Premium (~$2–5 per video)

Script: Claude API (Sonnet 4)
Video: Runway Gen-4.5
Voice + Compose: Creatomate (handles both)
Publish: Blotato

🔵 The Lean Stack — Budget / Bootstrapper (~$0.20–0.80 per video)

Script: Claude API (Haiku 4.5)
Video: Kling via WaveSpeed or Pika
Voice: OpenAI TTS or ElevenLabs free tier
Compose: FFmpeg (local)
Publish: tiktok-uploader + YouTube Data API

Scheduling & Running Autonomously

To make this truly autonomous, you need the pipeline to run on a schedule without you at the keyboard.

Option 1: Cron Job (Simplest)

Run your pipeline.py on a cron schedule. This works great for solo creators publishing 1–3 videos per day.

# Run pipeline every day at 8 AM and 5 PM
0 8,17 * * * cd /path/to/ai-video-pipeline && /usr/bin/python3 pipeline.py >> logs/cron.log 2>&1

Option 2: GitHub Actions (Free CI/CD)

Store your pipeline in a GitHub repo and use a scheduled GitHub Actions workflow. This gives you logging, retry logic, and secrets management for free.

# .github/workflows/daily-video.yml
name: Daily Video Pipeline
on:
  schedule:
    - cron: '0 8,17 * * *'   # 8 AM and 5 PM UTC
  workflow_dispatch:           # manual trigger

jobs:
  generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: sudo apt-get install -y ffmpeg
      - run: python pipeline.py
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          RUNWAY_API_KEY: ${{ secrets.RUNWAY_API_KEY }}
          ELEVENLABS_API_KEY: ${{ secrets.ELEVENLABS_API_KEY }}
          UPLOAD_POST_API_KEY: ${{ secrets.UPLOAD_POST_API_KEY }}

Option 3: Human-in-the-Loop Review

For maximum safety, add a review gate. The pipeline generates the video and saves it, then sends you a notification (email, Slack, or SMS). You approve it, and only then does it publish. Claude Code can set this up using webhook triggers or a simple Flask approval endpoint.

The 80/20 Rule: Automate 80% of the work — research, scripting, generation, compositing. Keep 20% human — review, community engagement, and high-stakes creative decisions. This prevents the "bot look" that tanks engagement and risks shadowbans.

Gotchas & Platform Rules

Instagram

The Instagram Graph API requires a Facebook Business account and app review. Using a unified posting API like Upload-Post or Blotato bypasses most of this complexity since they've already completed the review process. Reels must be 9:16, 3–90 seconds, and under 1 GB.

TikTok

TikTok's Content Posting API is only available to approved partners. For individual creators, cookie-based uploaders (like tiktok-uploader) work but may break when TikTok changes their frontend. Unified APIs abstract this away. Keep videos under 10 minutes and 4 GB.

YouTube

The YouTube Data API v3 has a daily quota of 10,000 units. Each video upload costs 1,600 units — so you can upload about 6 videos per day before hitting the cap. Shorts must be vertical (9:16), under 60 seconds, and should include #Shorts in the title or description.

General Tips

Never hardcode credentials. Always use .env files and python-dotenv. Claude Code will set this up correctly if you ask.
Vary your content. Posting identical or near-identical videos across platforms can trigger spam detection. Use slightly different captions, hashtags, and descriptions per platform.
Respect rate limits. Add retry logic with exponential backoff to all API calls.
Log everything. Save all intermediate files (script JSON, raw video, audio, final video) and publish results. When something fails at 3 AM, logs are all you have.

⚠️ Disclaimer: AI-generated content policies are evolving rapidly. Some platforms now require disclosure of AI-generated content. Always check the latest terms of service for Instagram, TikTok, and YouTube before publishing. This guide provides the technical infrastructure — you're responsible for compliance with platform policies.

Quick Links

Built with Claude Code · Published on troystechcorner.com · March 2026

Frequently Asked Questions

Can Claude Code build the whole AI video pipeline for me?

It can help a lot with scaffolding, scripting, debugging, and orchestration, but you still need to choose services, manage keys, test outputs, and review the automation.

What is the hardest part of an autonomous video pipeline?

Usually reliability across services. Generation quality, API failures, rate limits, subtitle timing, and platform publishing rules are where the workflow gets fragile.

Should beginners automate publishing on day one?

Usually no. It is smarter to prove that scripting, generation, voice, and composition work reliably before letting the pipeline publish automatically.

Architecture Overview

Prerequisites & Environment Setup

1. Install Claude Code

2. Python 3.10+ & FFmpeg

3. API Keys

4. Project Scaffold

Video Generation — Tool Comparison

Top Video Generation Tools

Our Take

Example: Generating Video with Runway (Python)

Voiceover & Audio Layer

Top Voice APIs

Example: Generating Voiceover with ElevenLabs

Video Composition & Post-Processing

Option A: FFmpeg (Free, Local)

Option B: Creatomate (API, Cloud-based)

Composition Approaches

Distribution — Posting to All Platforms

Distribution Tools

Our Take

Example: Publishing to All Platforms

Claude Code Orchestration

Step 1: Set Up CLAUDE.md

Step 2: Ask Claude Code to Build the Pipeline

Step 3: Review & Iterate

Recommended Stacks

✅ The Pro Stack — Best Overall (~$1–3 per video)

🟡 The Agency Stack — Hands-Off Premium (~$2–5 per video)

🔵 The Lean Stack — Budget / Bootstrapper (~$0.20–0.80 per video)

Scheduling & Running Autonomously

Option 1: Cron Job (Simplest)

Option 2: GitHub Actions (Free CI/CD)

Option 3: Human-in-the-Loop Review

Gotchas & Platform Rules

Instagram

TikTok

YouTube

General Tips

Quick Links

Frequently Asked Questions

Can Claude Code build the whole AI video pipeline for me?

What is the hardest part of an autonomous video pipeline?

Should beginners automate publishing on day one?

Watch the practical version

How to Build Voice AI Agents with Claude Code

Enjoyed this guide?

Related Guides

Claude Code vs. OpenClaw: Which AI Agent Should You Actually Use?

Vibe Coding Guide: How to Use AI to Build Faster Without Creating Long-Term Chaos

Local Web Server Guide: The Simplest Way to Test Sites Properly on Your Own Computer