Today, our task is to prepare a Python script (or suite of scripts) that can take a Japanese language VTT file, chunk it into smaller parts, and then feed them into a remote Ollama server to the model translationgemma:12b. It needs to chunk these files up because the context window for TranslationGemma:12b is still woefully small -- it cannot handle a full hour at a time.
Some environmental variables:
- My ollama server's base URL: http://ai-house:11434/
- The model to use: translategemma:12b
- The origin language: Japanese
- The target language: English
Some design requirements for the final python script
- The script should be triggered alone, and prompt the user to provide the local file address of a .vtt file. The output should be saved to the same folder. The process files should go in /tmp/, and then deleted after a successful translation.
- It must be beautiful, verbose, and loudly mention where it is in the process, so it is clear that the program is working and in progress.
- The script should keep an active, beautiful tally of progress with a TUI that auto-formats to the width of the terminal.
- The script should measure the total length of the VTT file (in minutes or hours) and display that, as well as display both the total number of chunked VTT files that resulted (assuming a very conservative token length window of 32,000 tokens total, both for the instructions, the source text, and the translated text.
- The script should direct the LLM to preserve the timestamps from the input to the output exactly -- they are critically important, as they time the subtitle to the video precisely.
- At the end of the VTT run, the script should run a sanity check on all of the resulting translated VTT chunks, making sure there is both NO japanese characters in any of the translated files, and that all files have text in them and are not empty. If the script senses any of these conditions, it should send those chunks back to Ollama to be reprocessed. Only do this process once -- if it produces a blank or Japanese-lanaguage file twice in a row, we should not force it, as it is a deterministic condition.
- The LLM only works when given the exact prompt below, which must be used exactly.
The Master Prompt
"You are a professional Japanese (日本語) to English translator. Your goal is to accurately convey the meaning and nuances of the original Japanese (日本語) text while adhering to English grammar, vocabulary, and cultural sensitivities. Produce only the English translation, without any additional explanations or commentary. Please translate the following Japanese (日本語) text into English:"
Output
Once each chunk of the VTT has been translated, it must then also be reassembled into one long .vtt file, with the original filename + '-EN' appended before the file type.