This is a set of python scripts to help with chunking up Japanese-language .VTT timed subtitle files, and passing them to the open source and local-first Ollama model, `TranslateGemma:12b`. This will feed each line to Ollama, log the return, and then recompose the final .VTT file when completed. Has sanity checking and blank return checking.
|
|
2 주 전 | |
|---|---|---|
| __pycache__ | 2 주 전 | |
| .DS_Store | 2 주 전 | |
| .gitignore | 2 주 전 | |
| AGENDA.md | 2 주 전 | |
| README.md | 2 주 전 | |
| [Thz.la]ktkp-076-EN.vtt | 2 주 전 | |
| [Thz.la]ktkp-076.vtt | 2 주 전 | |
| chunker.py | 2 주 전 | |
| ollama_client.py | 2 주 전 | |
| reassembler.py | 2 주 전 | |
| requirements.txt | 2 주 전 | |
| translate_vtt.py | 2 주 전 | |
| translator.py | 2 주 전 | |
| tui.py | 2 주 전 | |
| vtt_utils.py | 2 주 전 |
A Python-based tool for translating Japanese WebVTT subtitle files to English using the Ollama AI server and TranslationGemma model.
translategemma:12b modelInstall Python dependencies:
pip install -r requirements.txt
The tool uses environment variables for configuration:
# Ollama server base URL (default: http://localhost:11434/)
export OLLAMA_BASE_URL="http://localhost:11434/"
# Ollama model name (default: translategemma:12b)
export OLLAMA_MODEL="translategemma:12b"
If these aren't set, the script will use the default values above.
Run the main script:
python3 translate_vtt.py
The script will:
-EN.vtt file$ python3 translate_vtt.py
╔════════════════════════════════════════════════════════════╗
║ Japanese VTT Translator ║
╚════════════════════════════════════════════════════════════╝
ℹ Enter the path to your Japanese VTT file:
> /path/to/episode.vtt
✓ Selected: /path/to/episode.vtt
[2/6] Validate Ollama Connection
ℹ Server URL: http://ai-house:11434/
ℹ Model: translategemma:12b
✓ Connected to Ollama
✓ Model 'translategemma:12b' is available
[3/6] Load and Analyze VTT File
ℹ Loading VTT file...
✓ Loaded 1511 subtitles
📄 File: episode.vtt
⏱ Duration: 118 minutes (1.97 hours)
📦 Chunks: 1 (estimated based on 32k token limit)
[4/6] Chunk VTT File
ℹ Chunking file respecting token limits...
✓ Created 1 chunks
ℹ Average tokens per chunk: 1900
[5/6] Translate Chunks
ℹ Translating 1 chunks via Ollama (this may take several minutes)...
Chunk 1/1: ⏳ Processing... - 1511 subtitles
Chunk 1/1: ✓ Translated
✓ All 1 chunks translated successfully
[6/6] Reassemble and Finalize
ℹ Reassembling translated chunks...
✓ Reassembled into single file
╔════════════════════════════════════════════════════════════╗
║ Translation Complete! ║
╚════════════════════════════════════════════════════════════╝
ℹ Output file: /path/to/episode-EN.vtt
✓ Translation pipeline completed successfully!
The translated file is saved with the same name as the input, but with -EN appended before the file extension.
Example:
episode.vttepisode-EN.vtttranslate_vtt.py - Main orchestration script (run this)vtt_utils.py - VTT file parsing and utilitieschunker.py - Intelligent chunking logicollama_client.py - Ollama API communicationtranslator.py - Translation and sanity checkingreassembler.py - Chunk reassemblytui.py - Terminal UI componentsrequirements.txt - Python dependenciesThe tool conservatively estimates tokens to ensure no overflow:
This ensures safe operation even with the conservative token estimates.
Each translation undergoes sanity checks:
All timestamps are preserved exactly as they appear in the original file. This is critical for video synchronization.
curl http://ai-house:11434/api/tagsollama pull translategemma:12bThis project is provided as-is for personal use.