A tool for using Ollama (either local, or remote) to generate captions in .txt form, suitable for use in training FLUX LoRA files.

mitch donaberger c6bcbba921 Update 'README.md' 1 miesiąc temu
README.md c6bcbba921 Update 'README.md' 1 miesiąc temu
ollama_captionizer.py 3e2d7f2a12 v0.1 1 miesiąc temu

README.md

🖼️ Ollama Image Captionizer

A Python script that uses a local Ollama multimodal model to generate captions for your images in bulk. You can use the prompt to guide the vision model to include certain keywords, to describe a certain person by their name. It features a rich, interactive terminal user interface (TUI) for easy operation, configuration, and live progress tracking. This is mostly a helper tool for preparing image datasets for training with FLUX. They are captions, as unlike Stable Diffusion, FLUX relies on natural language processing over keyword processing.

A MacOS iTerm2 window of the Ollama Image Captionizer working its magic through the moondream model


✨ Features

  • Interactive TUI: A user-friendly, menu-driven interface built with rich and gum. No need to edit the script to change settings!
  • Flexible Image Selection: Process an entire directory of images or use the file picker to select specific images.
  • Live Progress Logging: A beautiful, real-time table shows you which files are being processed, their status, and a preview of the generated caption.
  • Smart Feedback: Uses emojis and colors to clearly indicate successes, skips, failures, and warnings for low-quality (e.g., single-word) captions.
  • Persistent Configuration: Your last-used settings (model, prompt, image source) are automatically saved to a config.json file for your next session.
  • Cross-Platform: Built with Python, it's designed to be compatible with macOS, Linux, and Windows.

⚙️ Requirements

Before you begin, ensure you have the following installed and running:

  1. Python 3.x
  2. Ollama: The script requires a running Ollama instance.
  3. A Multimodal Ollama Model: You need a model capable of processing images, such as moondream.

    ollama pull moondream
    
  4. Rich: A Python library for rich text and beautiful formatting in the terminal.

    pip install rich
    
  5. Gum: A tool for glamorous shell scripts, used for the interactive menus.

🚀 Quick Start

  1. Install Dependencies: Make sure you have installed Python, Rich, and Gum as listed in the requirements section.

  2. Start Ollama: Ensure the Ollama application is running and the server is active.

  3. Run the Script: Save the code as ollama_captionizer.py and run it from your terminal:

    python3 ollama_captionizer.py
    
  4. Use the Menu: You will be greeted by the main menu, where you can:

    • Set Image Source: Choose a directory or select specific image files.
    • Edit Prompt: Customize the prompt sent to the model.
    • Start Captioning: Begin the process.

Captions will be saved as .txt files with the same name as the original image (e.g., my_photo.jpg -> my_photo.txt).

🖥️ Cross-Platform Compatibility

This script is written in Python and is designed to be cross-platform. It should work on macOS, Linux, and Windows provided the dependencies are met.

A key feature is that it communicates with the Ollama server over its network API (e.g., http://localhost:11434). This means you do not need to modify the script to handle different executable names like ollama.exe on Windows.

The primary consideration for cross-platform use is ensuring that the gum command-line tool is properly installed and accessible in your system's PATH.