🖼️ Image → 🎙️ Speech (CPU)
Caption with
BLIP-2
→ 2) Speak with
SpeechT5
(HiFiGAN vocoder).
First run downloads models and speaker embeddings — please wait.
Upload an image (JPG/PNG)
Drop Image Here
- or -
Click to Upload
Generated Caption
Spoken Caption
Generate