AI-generated product images for ecommerce brands, using Flux and ComfyUI.
This one was actually launched and used by real companies for a couple of months. It generates product images for ecommerce brands using Flux image models orchestrated through ComfyUI workflows. Companies tested the output on social media and people couldn’t tell they were AI-generated.
The technical journey was interesting. The first approach was direct product rendering with fine-tuned Flux models, but getting proportions and details right was unreliable. The solution was a two-stage pipeline: generate the scene with a placeholder, then insert the actual product photo and use inpainting to blend it seamlessly. The code repo is mostly iterations on ComfyUI workflows—tuning prompts, post-processing steps, and blending techniques.
The two-stage pipeline runs entirely through ComfyUI workflows on GPU instances. Stage 1 takes a scene prompt and the product’s fine-tuned LoRA weights, generating a scene with a placeholder region where the product will go. Stage 2 composites the real product photo into the scene, generates a blending mask around the edges, and runs Flux inpainting to seamlessly merge the product into the generated environment. Post-processing handles color correction, upscaling, and sharpening. The pipeline outputs multiple aspect ratios (1:1, 4:5, 16:9, 9:16) for different ad platforms.
Onboarding a new product starts with a single product photo. A preprocessing script chops it into multiple crops and auto-captions each one with an LLM, tagging them with a unique trigger token (e.g. sks_crunchybars). This generates a full training dataset from one image. A LoRA fine-tuning run then trains low-rank adaptation layers on top of the frozen Flux base model, taking ~500–1000 steps (~20 minutes on an A100). The resulting weights (~50–100 MB) are stored in a model registry with metadata and validation scores. At inference time, the LoRA is loaded into the ComfyUI pipeline—the trigger token in the scene prompt activates the learned product appearance.
Initially ran on Modal, later moved to Azure GPUs for better throughput. Ultimately retired due to insufficient usage metrics, but the image quality was genuinely impressive.