Skip to main content

Genpeli — AI Video Editing Pipeline

Genpeli takes raw video footage and turns it into polished social-ready clips — automatically. It uses Whisper for transcription, a judge model for intelligent cut detection, FFmpeg for processing, and word-by-word caption rendering. The entire pipeline runs locally or on serverless GPUs.

Genpeli — AI Video Editing Pipeline hero visual

The Problem

Content creators spend hours manually cutting videos, adding captions, and reformatting for different platforms. The process is repetitive, time-consuming, and requires expensive editing software. Short-form content demands high volume but each piece still needs manual polish.

The Solution

An automated post-production pipeline. Upload raw footage → Whisper transcribes → a judge model scores every potential cut point (speech energy, sentence boundaries, hook words) → FFmpeg renders with word-by-word captions and normalized audio → export in the right format for each platform.

The Outcome

What used to take hours of manual editing now runs in minutes. Upload raw footage, get back polished clips with captions — ready to post on any platform.

Key Features

  • AI-powered cut detection — scores speech energy, sentence boundaries, topic changes
  • Word-by-word captions with speaker-level timing
  • Audio normalization and enhancement
  • Platform-optimized export (vertical/square/horizontal)
  • Serverless GPU processing via Modal.com

Technology Stack

PythonFastAPIWhisperFFmpegModal.comReactTailwind CSS

Interested in this project?

I'd love to discuss the technical details, challenges overcome, or similar projects I could build for you.

View Live AppLet's discuss this project
Built a speech ML pipeline from 46 papers. Ask me about it.