# 🖼️ Ollama Image Captionizer

A Python script that uses a local [Ollama](https://ollama.com/) multimodal model to generate captions for your images. It features a rich, interactive terminal user interface (TUI) for easy operation, configuration, and live progress tracking. This is mainly a tool for preparing image datasets for training with FLUX. They are captions, as unlike Stable Diffusion, FLUX relies on natural language processing over keyword processing.

![A MacOS iTerm2 window of the Ollama Image Captionizer working its magic through the moondream model](https://images.mitch.science/i/a7710cef-ea63-4206-a5ee-7aee0d244901.jpg)

---

## ✨ Features

*   **Interactive TUI:** A user-friendly, menu-driven interface built with `rich` and `gum`. No need to edit the script to change settings!
*   **Flexible Image Selection:** Process an entire directory of images or use the file picker to select specific images.
*   **Live Progress Logging:** A beautiful, real-time table shows you which files are being processed, their status, and a preview of the generated caption.
*   **Smart Feedback:** Uses emojis and colors to clearly indicate successes, skips, failures, and warnings for low-quality (e.g., single-word) captions.
*   **Persistent Configuration:** Your last-used settings (model, prompt, image source) are automatically saved to a `config.json` file for your next session.
*   **Cross-Platform:** Built with Python, it's designed to be compatible with macOS, Linux, and Windows.

## ⚙️ Requirements

Before you begin, ensure you have the following installed and running:

1.  **Python 3.x**
2.  **Ollama:** The script requires a running Ollama instance.
3.  **A Multimodal Ollama Model:** You need a model capable of processing images, such as `moondream`.
    ```bash
    ollama pull moondream
    ```
4.  **Rich:** A Python library for rich text and beautiful formatting in the terminal.
    ```bash
    pip install rich
    ```
5.  **Gum:** A tool for glamorous shell scripts, used for the interactive menus.
    *   **macOS:** `brew install gum`
    *   **Other Systems:** See the official [Gum installation guide](https://github.com/charmbracelet/gum#installation).

## 🚀 Quick Start

1.  **Install Dependencies:**
    Make sure you have installed Python, Rich, and Gum as listed in the requirements section.

2.  **Start Ollama:**
    Ensure the Ollama application is running and the server is active.

3.  **Run the Script:**
    Save the code as `ollama_captionizer.py` and run it from your terminal:
    ```bash
    python3 ollama_captionizer.py
    ```
4.  **Use the Menu:**
    You will be greeted by the main menu, where you can:
    *   **Set Image Source:** Choose a directory or select specific image files.
    *   **Edit Prompt:** Customize the prompt sent to the model.
    *   **Start Captioning:** Begin the process.

Captions will be saved as `.txt` files with the same name as the original image (e.g., `my_photo.jpg` -> `my_photo.txt`).

## 🖥️ Cross-Platform Compatibility

This script is written in Python and is designed to be cross-platform. It should work on **macOS, Linux, and Windows** provided the dependencies are met.

A key feature is that it communicates with the Ollama server over its network API (e.g., `http://localhost:11434`). This means **you do not need to modify the script to handle different executable names** like `ollama.exe` on Windows.

The primary consideration for cross-platform use is ensuring that the `gum` command-line tool is properly installed and accessible in your system's `PATH`.