README.md 6.4 KB

Japanese VTT Translator

A Python-based tool for translating Japanese WebVTT subtitle files to English using the Ollama AI server and TranslationGemma model.

Features

  • Intelligent Chunking: Automatically chunks VTT files to respect the TranslationGemma:12b context window (~32k tokens)
  • Accurate Translation: Uses the official TranslationGemma model for professional-quality Japanese-to-English translation
  • Beautiful TUI: Terminal-aware progress display that adapts to your terminal width
  • Quality Assurance: Automatic sanity checks to verify translations contain no Japanese characters and no empty subtitles
  • Preservation: Maintains exact timestamp formatting - critical for video synchronization
  • Retry Logic: Automatically retries failed translations once
  • Complete Reassembly: Combines all translated chunks back into a single, complete VTT file

Requirements

  • Python 3.7+
  • An Ollama server running the translategemma:12b model
  • Internet access to reach the Ollama server

Installation

  1. Clone or download this project
  2. Install Python dependencies:

    pip install -r requirements.txt
    

Configuration

The tool uses environment variables for configuration:

# Ollama server base URL (default: http://localhost:11434/)
export OLLAMA_BASE_URL="http://localhost:11434/"

# Ollama model name (default: translategemma:12b)
export OLLAMA_MODEL="translategemma:12b"

If these aren't set, the script will use the default values above.

Usage

Run the main script:

python3 translate_vtt.py

The script will:

  1. Prompt you to select a Japanese VTT file
  2. Validate the Ollama server connection
  3. Load and analyze the input file
  4. Display file duration and estimated chunk count
  5. Chunk the file respecting token limits
  6. Translate each chunk via Ollama
  7. Verify translations with sanity checks
  8. Reassemble into a final -EN.vtt file

Example Workflow

$ python3 translate_vtt.py

╔════════════════════════════════════════════════════════════╗
║                Japanese VTT Translator                    ║
╚════════════════════════════════════════════════════════════╝

ℹ Enter the path to your Japanese VTT file:
  > /path/to/episode.vtt

✓ Selected: /path/to/episode.vtt

[2/6] Validate Ollama Connection
ℹ Server URL: http://ai-house:11434/
ℹ Model: translategemma:12b
✓ Connected to Ollama
✓ Model 'translategemma:12b' is available

[3/6] Load and Analyze VTT File
ℹ Loading VTT file...
✓ Loaded 1511 subtitles
📄 File: episode.vtt
⏱  Duration: 118 minutes (1.97 hours)
📦 Chunks: 1 (estimated based on 32k token limit)

[4/6] Chunk VTT File
ℹ Chunking file respecting token limits...
✓ Created 1 chunks
ℹ Average tokens per chunk: 1900

[5/6] Translate Chunks
ℹ Translating 1 chunks via Ollama (this may take several minutes)...
  Chunk   1/1: ⏳ Processing... - 1511 subtitles
  Chunk   1/1: ✓ Translated
✓ All 1 chunks translated successfully

[6/6] Reassemble and Finalize
ℹ Reassembling translated chunks...
✓ Reassembled into single file

╔════════════════════════════════════════════════════════════╗
║                Translation Complete!                      ║
╚════════════════════════════════════════════════════════════╝

ℹ Output file: /path/to/episode-EN.vtt
✓ Translation pipeline completed successfully!

Output

The translated file is saved with the same name as the input, but with -EN appended before the file extension.

Example:

  • Input: episode.vtt
  • Output: episode-EN.vtt

File Structure

  • translate_vtt.py - Main orchestration script (run this)
  • vtt_utils.py - VTT file parsing and utilities
  • chunker.py - Intelligent chunking logic
  • ollama_client.py - Ollama API communication
  • translator.py - Translation and sanity checking
  • reassembler.py - Chunk reassembly
  • tui.py - Terminal UI components
  • requirements.txt - Python dependencies

Technical Details

Chunking Strategy

The tool conservatively estimates tokens to ensure no overflow:

  • Max tokens per chunk: 15,000
  • Reserved for overhead: 300 tokens (prompt + instructions)
  • Result capacity: ~17,000 tokens for output
  • Total budget: ~32,000 tokens

This ensures safe operation even with the conservative token estimates.

Translation Quality

Each translation undergoes sanity checks:

  1. Non-empty verification: All subtitles must contain text
  2. Language verification: No Japanese characters allowed in output
  3. Retry logic: Failed chunks are retried once
  4. Deterministic failure: If a chunk fails twice, it's marked as failed

Timestamp Preservation

All timestamps are preserved exactly as they appear in the original file. This is critical for video synchronization.

Troubleshooting

"Cannot connect to Ollama server"

  • Verify Ollama is running: curl http://ai-house:11434/api/tags
  • Check the URL matches your setup
  • Ensure network connectivity to the Ollama host

"Could not verify model availability"

  • Make sure the TranslationGemma model is pulled: ollama pull translategemma:12b
  • Verify the model name is correct

Translation fails or produces Japanese output

  • The model may be overwhelmed - try splitting into smaller files manually
  • Verify the Ollama server has sufficient resources
  • Check the console output for specific error messages

Empty output file

  • All chunks failed translation - check Ollama logs
  • Verify the model is properly loaded and responsive

Limitations

  • Large files (>3 hours) may need to be split manually before processing
  • Translation quality depends on the TranslationGemma model's performance
  • Processing time scales with video duration (typically 30-60 minutes for 2-hour videos)

Performance Notes

  • Translation time depends on video length, system resources, and Ollama server performance
  • Typical speed: 1-2 minutes of video per minute of processing time
  • Chunks are processed sequentially

License

This project is provided as-is for personal use.