Genpeli — AI Video Editing Pipeline
Genpeli takes raw video footage and turns it into polished social-ready clips — automatically. It uses Whisper for transcription, a judge model for intelligent cut detection, FFmpeg for processing, and word-by-word caption rendering. The entire pipeline runs locally or on serverless GPUs.

The Problem
Content creators spend hours manually cutting videos, adding captions, and reformatting for different platforms. The process is repetitive, time-consuming, and requires expensive editing software. Short-form content demands high volume but each piece still needs manual polish.
The Solution
An automated post-production pipeline. Upload raw footage → Whisper transcribes → a judge model scores every potential cut point (speech energy, sentence boundaries, hook words) → FFmpeg renders with word-by-word captions and normalized audio → export in the right format for each platform.
The Outcome
What used to take hours of manual editing now runs in minutes. Upload raw footage, get back polished clips with captions — ready to post on any platform.
Key Features
- AI-powered cut detection — scores speech energy, sentence boundaries, topic changes
- Word-by-word captions with speaker-level timing
- Audio normalization and enhancement
- Platform-optimized export (vertical/square/horizontal)
- Serverless GPU processing via Modal.com
Technology Stack
Interested in this project?
I'd love to discuss the technical details, challenges overcome, or similar projects I could build for you.
View Live AppLet's discuss this project