A tool for using Ollama (either local, or remote) to generate captions in .txt form, suitable for use in training FLUX LoRA files.

mitch donaberger 3e2d7f2a12 v0.1 hai 1 mes
README.md 3e2d7f2a12 v0.1 hai 1 mes
ollama_captionizer.py 3e2d7f2a12 v0.1 hai 1 mes

README.md

🖼️ Ollama Image Captionizer

A Python script that uses a local Ollama multimodal model to generate captions for your images. It features a rich, interactive terminal user interface (TUI) for easy operation, configuration, and live progress tracking. This is mainly a tool for preparing image datasets for training with FLUX. They are captions, as unlike Stable Diffusion, FLUX relies on natural language processing over keyword processing.

Screenshot of Ollama Image Captionizer
(Note: Replace with an actual screenshot of the script in action)


✨ Features

  • Interactive TUI: A user-friendly, menu-driven interface built with rich and gum. No need to edit the script to change settings!
  • Flexible Image Selection: Process an entire directory of images or use the file picker to select specific images.
  • Live Progress Logging: A beautiful, real-time table shows you which files are being processed, their status, and a preview of the generated caption.
  • Smart Feedback: Uses emojis and colors to clearly indicate successes, skips, failures, and warnings for low-quality (e.g., single-word) captions.
  • Persistent Configuration: Your last-used settings (model, prompt, image source) are automatically saved to a config.json file for your next session.
  • Cross-Platform: Built with Python, it's designed to be compatible with macOS, Linux, and Windows.

⚙️ Requirements

Before you begin, ensure you have the following installed and running:

  1. Python 3.x
  2. Ollama: The script requires a running Ollama instance.
  3. A Multimodal Ollama Model: You need a model capable of processing images, such as moondream.

    ollama pull moondream
    
  4. Rich: A Python library for rich text and beautiful formatting in the terminal.

    pip install rich
    
  5. Gum: A tool for glamorous shell scripts, used for the interactive menus.

🚀 Quick Start

  1. Install Dependencies: Make sure you have installed Python, Rich, and Gum as listed in the requirements section.

  2. Start Ollama: Ensure the Ollama application is running and the server is active.

  3. Run the Script: Save the code as ollama_captionizer.py and run it from your terminal:

    python3 ollama_captionizer.py
    
  4. Use the Menu: You will be greeted by the main menu, where you can:

    • Set Image Source: Choose a directory or select specific image files.
    • Edit Prompt: Customize the prompt sent to the model.
    • Start Captioning: Begin the process.

Captions will be saved as .txt files with the same name as the original image (e.g., my_photo.jpg -> my_photo.txt).

🖥️ Cross-Platform Compatibility

This script is written in Python and is designed to be cross-platform. It should work on macOS, Linux, and Windows provided the dependencies are met.

A key feature is that it communicates with the Ollama server over its network API (e.g., http://localhost:11434). This means you do not need to modify the script to handle different executable names like ollama.exe on Windows.

The primary consideration for cross-platform use is ensuring that the gum command-line tool is properly installed and accessible in your system's PATH.