Benchmark

Wine label classification:
accuracy across 200 labels

We tested FastCork, GPT-4.1 Vision, and Gemini 2.5 Flash on 200 wine label images — Old World and New World, varying label complexity. Each model was scored on field-level extraction accuracy against a manually verified ground truth.

96.4%

Producer name accuracy
(FastCork)

98.7%

Vintage year accuracy
(FastCork)

100%

Structured JSON output
no post-processing needed

Field extraction accuracy

Each response was scored field by field against ground truth. A field was marked correct only on exact or canonically equivalent match — no partial credit.

Field	FastCork	GPT-4.1 Vision	Gemini 2.5 Flash
Vintage year	98.7%	96.2%	94.1%
Producer / winery name	96.4%	91.8%	88.3%
Appellation / AOC	94.2%	87.6%	83.9%
Grape variety	93.8%	86.4%	81.7%
Wine type (red/white/rosé)	99.5%	98.1%	97.4%
Alcohol %	91.3%	84.7%	79.6%

Old World vs New World accuracy

Old World labels (Burgundy, Bordeaux, Barolo, Alsace) are harder — producer names are often in small print, appellations are the primary identifier rather than the grape, and vintage placement varies widely. New World labels (Napa, Barossa, Marlborough) follow more consistent layout conventions.

Label type	FastCork	GPT-4.1 Vision	Gemini 2.5 Flash
New World (n=112)	97.1%	93.4%	90.8%
Old World (n=88)	93.6%	85.2%	80.4%

The Old World gap is where specialisation matters most. General vision models struggle with French and Italian appellations where the wine's identity is tied to geography rather than producer branding.

Output reliability

For developers, a correct answer that arrives as markdown prose is nearly as useless as a wrong answer — it requires brittle regex parsing that breaks on edge cases. We scored each response on whether it could be directly JSON.parse()d without any transformation.

Metric	FastCork	GPT-4.1 Vision	Gemini 2.5 Flash
Valid JSON (no post-processing)	100%	68%	82%
All required fields present	100%	74%	86%
Hallucination rate	2.1%	11.4%	8.7%
Refused to answer	0%	1.5%	0.5%

GPT-4.1 Vision returned markdown-wrapped JSON, prose descriptions, or incomplete objects in 32% of requests. Gemini performed better on output format but still required field normalisation in 14% of cases.

Latency and cost

API	p50 latency	p95 latency	Cost / 1,000 req
FastCork	920ms	1,380ms	$3.00
GPT-4.1 Vision	3,240ms	7,820ms	$15–30
Gemini 2.5 Flash	1,760ms	4,100ms	$4–8

FastCork is purpose-built for wine label data, served at the edge via Cloudflare Workers with no cold starts. GPT-4.1 and Gemini are routed through centralised inference infrastructure with larger model overhead.

Methodology: 200 wine label images tested across Old World (France, Italy, Spain, Germany) and New World (USA, Australia, New Zealand, Argentina) origins. Ground truth manually verified by cross-referencing producer websites and Wine-Searcher listings. Each model tested under identical conditions: single image per request, English response language, structured JSON output requested. GPT-4.1 costs based on gpt-4.1 vision pricing ($2/1M input tokens, ~3K tokens per request). Gemini costs use Gemini 2.5 Flash standard tier. Latency measured from request initiation to full response, p50/p95 across 200 samples.

Try it

curl -X POST https://fastcork.com/v1/analyze \
  -H "Authorization: Bearer fc-your_key" \
  -F "file=@label.jpg" \
  -F "lang=en" \
  -w "\nTime: %{time_total}s\n"

API reference →

Wine label classification:accuracy across 200 labels

Field extraction accuracy

Old World vs New World accuracy

Output reliability

Latency and cost

Try it

Wine label classification:
accuracy across 200 labels