01 · Objects

Real-world things in frame

Names every concrete object visible — cars, people, laptops, food, instruments — with confidence scores.

personcarlaptopdining tableguitar

02 · Text

Words on, in, around it

Reads signage, UI text, captions, OCR in stylised type. Dictionary-filtered so noise doesn't reach you.

headlinesUI labelscaptionslogos

03 · Scene

Where this could be

Indoor or outdoor, kitchen or skyline, server room or beach, with the next two close-runners-up.

server roommountain vistacafé interiorcity street

04 · Style

Medium and aesthetic

Photograph or illustration, oil painting or pixel art, vintage or futuristic, gothic or playful.

digital illustrationmonochromevector arthorror style

05 · Brand

Logos, products, marks

Recognises car brands, fashion labels, tech logos, food and drink — without per-brand training.

BMWCoca-ColaNikeStarbucksApple

06 · Cultural

Artworks, landmarks, references

Famous paintings, monuments, cinematic frames — zero-shot, no fine-tune.

Mona LisaEiffel TowerSchönbrunnVan Gogh

Swap one endpoint.
Keep your code.

Already using the Gemini SDK or just curl-posting base64 images? Point your endpoint at retina.frank.ink and keep the same request/response shape. Your existing agent framework will not notice the swap.

The Gemini-compat surface lives at /v1beta/models/{model}:generateContent with the same contents / parts / inline_data shape.

# Retina-native (structured JSON) curl https://retina.frank.ink/v1/analyze \ -H "Authorization: Bearer rk_live_..." \ -F "file=@photo.jpg" \ -F "hint=is there a dog?" # Gemini-compatible drop-in curl https://retina.frank.ink/v1beta/\ models/gemini-flash-latest:\ generateContent \ -H "x-goog-api-key: rk_live_..." \ -H "Content-Type: application/json" \ -d '{ "contents": [ ... ] }'

We taught a 4-vCPU box to see like a VLM
— without renting one.

Drop an image, see what we see.

Six surfaces of meaning, one call.

Real-world things in frame

Words on, in, around it

Where this could be

Medium and aesthetic

Logos, products, marks

Artworks, landmarks, references

Structured. Specific. Ready to use.

Swap one endpoint.
Keep your code.

Honest about where the model falls short.

Two metrics, both reported

No compositional reasoning

Narrow image distribution

Anime / fantasy chimeras are hard

Sign up. 50 images/day.
Free during Beta.

We taught a 4-vCPU box to see like a VLM — without renting one.

Drop an image, see what we see.

Six surfaces of meaning, one call.

Real-world things in frame

Words on, in, around it

Where this could be

Medium and aesthetic

Logos, products, marks

Artworks, landmarks, references

Structured. Specific. Ready to use.

Swap one endpoint.Keep your code.

Honest about where the model falls short.

Two metrics, both reported

No compositional reasoning

Narrow image distribution

Anime / fantasy chimeras are hard

Sign up. 50 images/day.Free during Beta.

We taught a 4-vCPU box to see like a VLM
— without renting one.

Swap one endpoint.
Keep your code.

Sign up. 50 images/day.
Free during Beta.