ACL 2026

Measuring Cross-Market Generative Ability of
Vision–Language Models via Movie Poster Transcreation

Youyuan Lin  ·  Yuan Li  ·  Yahan Yu  ·  Fei Cheng  ·  Shinya Nishida  ·  Chenhui Chu

Kyoto University

Cross-market image transcreation requires preserving movie identity while adapting to market-specific design preferences and multilingual typography — a challenge that goes far beyond simple translation. We introduce MPTc-Bench, a benchmark of 582 aligned poster pairs spanning 34 target markets, sourced from Douban, Eiga, and IMDb. We define two task variants: Surface (text-centric localisation) and Deep (preference-level style adaptation), and propose a two-stage planner–editor pipeline. Evaluation combines information-retention checks, LLM-as-a-judge aesthetic scoring, and objective visual similarity signals. Our experiments reveal a substantial gap between model outputs and human-crafted target-market posters — particularly in faithful text rendering — and show that Gemini 3 currently leads across both task variants.

MPTc-Bench

Carefully curated aligned poster pairs with rich metadata, filtered from over 4,500 cross-market candidates using perceptual hashing and GLOBE cultural clustering.

582 Aligned poster pairs
34 Target markets
2 Task variants
6 Image editors benchmarked
World map showing coverage of 34 target markets

Global coverage: 34 target markets across Asia, Europe, North and South America. Market selection is informed by GLOBE cultural clustering to ensure diversity.

Access the Dataset

from datasets import load_dataset
# Download: https://github.com/minamotooRin/mptc-bench/tree/main/data
import json, urllib.request
for split in ["surface", "deep"]:
    url = f"https://raw.githubusercontent.com/minamotooRin/mptc-bench/main/data/mptcbench_{split}.jsonl"
    # urllib.request.urlretrieve(url, f"mptcbench_{split}.jsonl")

Task Design & Evaluation

Poster pairs are split into two task levels using perceptual hash (pHash) distance, capturing fundamentally different localisation challenges.

Surface Transcreation pHash < 12

The source and target posters share the same layout and visual composition. The model must translate title and tagline text, adapt typography, and localise any text-overlay elements — without altering the visual design.

Deep Transcreation pHash > 30

Source and target posters differ substantially in visual design. The model must re-compose the layout, adjust character poses, colour palette, and graphic motifs to match the cultural and aesthetic preferences of the target market.

Surface vs. Deep transcreation examples

Surface (top) vs. Deep (bottom) examples: Surface tasks preserve composition; Deep tasks require full visual redesign.

THREE-DIMENSIONAL EVALUATION

📋

Basic Info

Title fidelity (chrF), genre & year quiz accuracy. Measures whether key movie metadata is preserved in the transcreated poster.

🎨

Aesthetic & Adaptation

LLM-as-a-judge direct scoring (1–5) for visual quality and cultural appropriateness, plus pairwise win-rate vs. human GT.

📐

Objective Similarity

CLIP cosine similarity, FID (GT↔MTC, SRC↔MTC), and PPOCRv5 text-layout IoU for surface tasks.

Leaderboard

Click any column header to sort. Human GT row (greyed) shows the upper-bound reference. Aesthetic and Adaptation scores are on a 1–5 scale; Win Rate and Title chrF are percentages.

Key Finding: Text Rendering Gap

Planners generate accurate title translations, but image editors often fail to render them faithfully. Diffusion-based models lose 30–50% of title fidelity in this planner→editor gap.

Title fidelity loss from planner to image editor

Title chrF: planned translation vs. visually rendered output. Gemini 3 nearly closes the gap; diffusion models suffer severe losses.

Aesthetic vs. Adaptation score scatter

Aesthetic score vs. Adaptation score for all evaluated systems (Deep task). Gemini 3 is the only system that matches or exceeds GT adaptation.

Qualitative Examples

Three case studies showing SRC (original poster), GT (human target-market poster), and top model outputs. Click any image to enlarge.

Cite This Work

If you use MPTc-Bench in your research, please cite our paper.

@inproceedings{lin2025mptcbench,
  title     = {Measuring Cross-Market Generative Ability of
               Vision--Language Models via Movie Poster Transcreation},
  author    = {Lin, Youyuan and Li, Yuan and Yu, Yahan and
               Cheng, Fei and Nishida, Shinya and Chu, Chenhui},
  booktitle = {Proceedings of the 63rd Annual Meeting of the
               Association for Computational Linguistics (ACL)},
  year      = {2025},
  note      = {Under review}
}