Audio Dataset Manager

January 1, 2025

Audio Dataset Manager is a comprehensive toolkit designed to streamline the preparation of audio datasets for TTS (Text-to-Speech) training and voice cloning projects.

Born from the need to efficiently process dozens of hours of audiobook data for voice cloning, this tool bridges the gap between raw audio files and training-ready datasets.

Features:

Audio analysis and silence detection
Format conversion (MP3/WAV)
Intelligent splitting into training-ready clips (0.6–11s)
Auto-transcription powered by OpenAI Whisper
Interactive dataset review UI
JSON management with automatic backup and versioning

View on GitHub