The choice to train bespoke model builds on production art indicates filmmakers are prioritizing curated training data and control over vanilla prompt - driven outputs to get consistent cinematic imagery.
AI Quick Take
- Dear Upstairs Neighbors used concept art to train custom builds of Google DeepMind’s Veo and Imagen.
- The move illustrates a shift from feeding prompts into off-the-shelf models toward bespoke model tailoring for visual consistency.
- Studios and vendors will need new pipelines, licensing terms, and quality controls if this approach spreads.
Filmmakers behind Dear Upstairs Neighbors used concept art to train custom builds of Google DeepMind’s Veo and Imagen for their Tribeca presentation, revealing a production-focused approach to generative models that goes beyond feeding prompts into off-the-shelf systems. That concrete choice-incorporating production artwork as model training input-marks a deliberate pivot: creative teams are experimenting with bespoke model builds to get the kinds of consistent visuals conventional prompting struggles to deliver.
What happened on this project was straightforward in method if significant in implication: concept art assets were converted into training material and used to fine-tune instances of Veo and Imagen to influence the film’s visual output. The Verge reports these custom builds were part of the film’s creative tooling, rather than a reliance on vanilla model outputs generated solely by text prompts. This is an important distinction because it ties generative output directly to curated production assets, not just to iterative prompting or on-the-fly image synthesis.
Why that difference matters operationally is practical. Current mainstream video models often produce short, visually inconsistent sequences that work for experiments and social clips but not for a coherent narrative film. Prompt-based workflows can be fast for exploratory work, but they typically lack the stability and repeatability that filmmakers need when translating a concept across multiple shots, scenes, and visual effects passes. Training or fine-tuning a model on a body of production art gives filmmakers a stronger control mechanism over style, color palettes, and character appearance than prompting alone.
Adopting bespoke models changes the technical and staffing profile of a production. Teams will need processes to prepare and vet training data, manage model checkpoints, and integrate generated assets into VFX pipelines-tasks closer to traditional asset management than to casual prompt engineering. That creates new roles and vendor relationships: VFX houses or dedicated ML engineers may become part of the core creative crew, and procuring model training or fine-tuning services will become a line item in budgets and contracts.
The choice also amplifies legal and commercial questions that the industry has already started to face. Training models on production-owned concept art avoids some disputes about third-party data, but it raises issues around reuse, derivative rights, and downstream ownership of model outputs-particularly when vendors supply the GPUs, model code, or hosted training services. Studios and rights holders will need clearer licensing language and operational controls if bespoke model builds become an accepted part of filmmaking.
Context matters: the movement toward custom models is happening against a backdrop where many AI video models are still maturing and several high-profile Hollywood partnerships have recently collapsed or cooled. That experience has made studios cautious about handing over creative control to generic systems or outside providers. The Tribeca example suggests one path forward: keep the model inside a production-controlled loop, using curated assets to steer outcomes rather than trusting general-purpose models to hit the mark out of the box.
There are limits and open questions. The Verge’s reporting shows the technique was used on this project, but it does not quantify how extensively custom models replaced traditional VFX or whether the process scaled across a full feature workflow. Practical concerns-compute costs, iteration speed, reproducibility, and quality control-remain unresolved at scale. The legal frameworks for models trained on production assets are still nascent, too, and studios will likely take a conservative approach until those commercial and compliance implications are clearer.
For practitioners and product teams, the takeaway is operational: if generative AI is going to move from novelty to production utility in film, the industry will have to build data pipelines, contracting norms, and technical guardrails that support bespoke models. Watch for more projects to publish their methods, for AI vendors to add fine-tuning and enterprise orchestration features, and for studios to demand contractual clarity on model training and output rights. That sequence-tooling, contracts, and published case studies-will determine whether custom model builds become a standard part of the filmmaker’s toolkit or remain isolated experiments.