# 🖼️ Ollama Image Captionizer A Python script that uses a local [Ollama](https://ollama.com/) multimodal model to generate captions for your images. It features a rich, interactive terminal user interface (TUI) for easy operation, configuration, and live progress tracking. This is mainly a tool for preparing image datasets for training with FLUX. They are captions, as unlike Stable Diffusion, FLUX relies on natural language processing over keyword processing. ![A MacOS iTerm2 window of the Ollama Image Captionizer working its magic through the moondream model](https://images.mitch.science/i/a7710cef-ea63-4206-a5ee-7aee0d244901.jpg) --- ## ✨ Features * **Interactive TUI:** A user-friendly, menu-driven interface built with `rich` and `gum`. No need to edit the script to change settings! * **Flexible Image Selection:** Process an entire directory of images or use the file picker to select specific images. * **Live Progress Logging:** A beautiful, real-time table shows you which files are being processed, their status, and a preview of the generated caption. * **Smart Feedback:** Uses emojis and colors to clearly indicate successes, skips, failures, and warnings for low-quality (e.g., single-word) captions. * **Persistent Configuration:** Your last-used settings (model, prompt, image source) are automatically saved to a `config.json` file for your next session. * **Cross-Platform:** Built with Python, it's designed to be compatible with macOS, Linux, and Windows. ## ⚙️ Requirements Before you begin, ensure you have the following installed and running: 1. **Python 3.x** 2. **Ollama:** The script requires a running Ollama instance. 3. **A Multimodal Ollama Model:** You need a model capable of processing images, such as `moondream`. ```bash ollama pull moondream ``` 4. **Rich:** A Python library for rich text and beautiful formatting in the terminal. ```bash pip install rich ``` 5. **Gum:** A tool for glamorous shell scripts, used for the interactive menus. * **macOS:** `brew install gum` * **Other Systems:** See the official [Gum installation guide](https://github.com/charmbracelet/gum#installation). ## 🚀 Quick Start 1. **Install Dependencies:** Make sure you have installed Python, Rich, and Gum as listed in the requirements section. 2. **Start Ollama:** Ensure the Ollama application is running and the server is active. 3. **Run the Script:** Save the code as `ollama_captionizer.py` and run it from your terminal: ```bash python3 ollama_captionizer.py ``` 4. **Use the Menu:** You will be greeted by the main menu, where you can: * **Set Image Source:** Choose a directory or select specific image files. * **Edit Prompt:** Customize the prompt sent to the model. * **Start Captioning:** Begin the process. Captions will be saved as `.txt` files with the same name as the original image (e.g., `my_photo.jpg` -> `my_photo.txt`). ## 🖥️ Cross-Platform Compatibility This script is written in Python and is designed to be cross-platform. It should work on **macOS, Linux, and Windows** provided the dependencies are met. A key feature is that it communicates with the Ollama server over its network API (e.g., `http://localhost:11434`). This means **you do not need to modify the script to handle different executable names** like `ollama.exe` on Windows. The primary consideration for cross-platform use is ensuring that the `gum` command-line tool is properly installed and accessible in your system's `PATH`.