Wine label classification:
accuracy across 200 labels
We tested FastCork, GPT-4.1 Vision, and Gemini 2.5 Flash on 200 wine label images — Old World and New World, varying label complexity. Each model was scored on field-level extraction accuracy against a manually verified ground truth.
(FastCork)
(FastCork)
no post-processing needed
Field extraction accuracy
Each response was scored field by field against ground truth. A field was marked correct only on exact or canonically equivalent match — no partial credit.
| Field | FastCork | GPT-4.1 Vision | Gemini 2.5 Flash |
|---|---|---|---|
| Vintage year | 98.7% | 96.2% | 94.1% |
| Producer / winery name | 96.4% | 91.8% | 88.3% |
| Appellation / AOC | 94.2% | 87.6% | 83.9% |
| Grape variety | 93.8% | 86.4% | 81.7% |
| Wine type (red/white/rosé) | 99.5% | 98.1% | 97.4% |
| Alcohol % | 91.3% | 84.7% | 79.6% |
Old World vs New World accuracy
Old World labels (Burgundy, Bordeaux, Barolo, Alsace) are harder — producer names are often in small print, appellations are the primary identifier rather than the grape, and vintage placement varies widely. New World labels (Napa, Barossa, Marlborough) follow more consistent layout conventions.
| Label type | FastCork | GPT-4.1 Vision | Gemini 2.5 Flash |
|---|---|---|---|
| New World (n=112) | 97.1% | 93.4% | 90.8% |
| Old World (n=88) | 93.6% | 85.2% | 80.4% |
The Old World gap is where specialisation matters most. General vision models struggle with French and Italian appellations where the wine's identity is tied to geography rather than producer branding.
Output reliability
For developers, a correct answer that arrives as markdown prose is nearly as useless as a wrong answer — it requires brittle regex parsing that breaks on edge cases. We scored each response on whether it could be directly JSON.parse()d without any transformation.
| Metric | FastCork | GPT-4.1 Vision | Gemini 2.5 Flash |
|---|---|---|---|
| Valid JSON (no post-processing) | 100% | 68% | 82% |
| All required fields present | 100% | 74% | 86% |
| Hallucination rate | 2.1% | 11.4% | 8.7% |
| Refused to answer | 0% | 1.5% | 0.5% |
GPT-4.1 Vision returned markdown-wrapped JSON, prose descriptions, or incomplete objects in 32% of requests. Gemini performed better on output format but still required field normalisation in 14% of cases.
Latency and cost
| API | p50 latency | p95 latency | Cost / 1,000 req |
|---|---|---|---|
| FastCork | 920ms | 1,380ms | $3.00 |
| GPT-4.1 Vision | 3,240ms | 7,820ms | $15–30 |
| Gemini 2.5 Flash | 1,760ms | 4,100ms | $4–8 |
FastCork is purpose-built for wine label data, served at the edge via Cloudflare Workers with no cold starts. GPT-4.1 and Gemini are routed through centralised inference infrastructure with larger model overhead.
gpt-4.1 vision pricing ($2/1M input tokens, ~3K tokens per request). Gemini costs use Gemini 2.5 Flash standard tier. Latency measured from request initiation to full response, p50/p95 across 200 samples.
Try it
curl -X POST https://fastcork.com/v1/analyze \
-H "Authorization: Bearer fc-your_key" \
-F "file=@label.jpg" \
-F "lang=en" \
-w "\nTime: %{time_total}s\n"