Describe images, detect objects, and extract text from any image URL
Analyze images for detailed descriptions, object detection, and OCR text extraction. Accepts images via URL or base64. Auto-detects the right mode from your task — OCR for text extraction, counting for quantity questions, or full description by default.
image_url (JPEG, PNG, GIF, WebP) or image_base64 encoded imagetask — mention "read", "OCR", or "license plate" for text extraction; "count" or "how many" for counting mode| Permission | Scope | Reason |
|---|---|---|
| Network | aiprox.dev | API calls to orchestration endpoint |
| Env Read | AIPROX_SPEND_TOKEN | Authentication for paid API |
curl -X POST https://aiprox.dev/api/orchestrate \
-H "Content-Type: application/json" \
-H "X-Spend-Token: $AIPROX_SPEND_TOKEN" \
-d '{
"task": "extract all text from this image",
"image_url": "https://example.com/photo.jpg"
}'
{
"description": "A modern office workspace with a standing desk and dual monitors.",
"objects": ["desk", "monitors", "keyboard", "mouse", "plant", "window", "headphones"],
"text_found": "Visual Studio Code - main.js",
"mode": "ocr"
}
Vision Bot fetches and analyzes images via URL or base64 input. Images are processed transiently using Claude's vision capabilities via LightningProx. No images are stored. Your spend token is used for payment only.
ZIP package — ready to use