AI-directed automotive spec commercial. Full case study covering creative direction, prompt engineering, and production workflow using Nano Banana Pro and Kling 3.0.
Personal Project
Content Creation
March 9, 2026

I set myself a challenge: produce a spec automotive commercial that could pass as a real production. Desert location. Golden hour. Premium SUV. Cinematic grading. Portrait work. Driving sequences. Interior B-roll. The full shot list you'd expect from a two-day shoot with a crew of twelve.
The constraint: do it entirely with AI, from frame generation to final video output.
A 30 second automotive spec commercial produced entirely with AI generative tools. Fifteen-plus unique shots spanning interiors, exteriors, aerials, portraits, detail inserts, and driving sequences. Full narrative arc from opening detail cuts through cinematic driving and hero moments to a reflective closing. It works as an experimental proof of concept, but it's also shown me where AI production falls short right now. Physics aren't accurate, characters carry no real emotion, and there are plenty of giveaways. We're in the early stages of something that will reshape production the way VFX did. It's just not consumer-ready yet.
AI-generated content still has a reputation for looking obviously artificial. Plastic skin, warped hands, inconsistent lighting, that unmistakable "AI look" that kills credibility the moment you see it. Most AI content falls apart because it's generated without any real understanding of cinematography, editorial pacing, or visual storytelling.
The question I wanted to answer: can someone with 12 years of production experience use these tools to create something that holds up to professional standards?
I started the same way I'd start any real production. I pulled reference material from actual automotive campaigns to establish the visual language: lighting setups, camera angles, wardrobe, colour palettes, and composition frameworks. Every shot was planned before a single prompt was written.
I defined the visual identity early: black leather jacket, white t-shirt, desert setting, warm golden hour grading throughout. Consistency across every frame was the priority.
Some still frames were generated using Nano Banana Pro, which runs on Google's Gemini model. I uploaded my own portrait as the identity reference and wrote custom prompts for each scene. The prompts are JSON format which I found the best for working with AI systems.
Prompt example:
Ultra-photorealistic 8K cinematic close-up photograph shot from outside a dark olive-green metallic luxury SUV, looking through the fully open driver's side window. The subject sits in the driver's seat wearing a black leather jacket, looking directly at camera through the open window with a calm intense expression. Tan caramel leather headrest visible behind his head. Warm golden hour desert light from behind the camera, soft highlights on the dark green metallic bodywork around the window frame. Desert landscape faintly reflected in the rear window glass behind him. Shot on medium format camera, 85mm lens, f/1.8. Shallow depth of field, face and window frame razor sharp, car body falling into soft bokeh. True-to-life skin texture, visible pores, natural skin grain, no plastic smoothing. Colour grading: warm golden tones, rich deep greens on the car body, accurate skin tones, cinematic high dynamic range. Output: 8K, ultra sharp, hyperreal, photorealistic, no noise, no artifacts, no face distortion.
This phase hit roadblocks after 2 prompts. Gemini's safety layer was tightened during the project, blocking face recreation with uploaded photos on exterior scenes. Interior shots passed through fine, but any prompt describing a person beside a vehicle was flagged and denied.
The workaround: I stripped all brand names, vehicle descriptions, and automotive terminology from the exterior prompts and reframed them as fashion editorial portraits in desert settings. Interiors continued to generate without issue.
For shots that Nano Banana consistently blocked, I moved to Kling AI's image generation which handled identity-preserved generation without the same restrictions.
The final frame set included over 15 unique compositions: interior driving shots, through-the-windscreen portraits, wide desert establishing shots, boot-sitting lifestyle shots, walking sequences, hero portraits beside the car, detail inserts of the door handle, and drone-perspective aerials.
Each still frame was then used as a keyframe for video generation in Kling 3.0. Every shot required a custom video prompt describing the specific camera movement, subject action, and environmental motion.
The video prompts were written with real cinematography principles. Each one specified lens focal length, aperture, camera rig type, motion path, and speed. This is where production experience made the difference. Knowing that a push-in works better than an orbit for a hero reveal, or that a locked-off static camera creates more tension for a walk-toward-camera shot, or that interior driving footage should be real-time not slow motion. These aren't things you learn from a prompt guide. They come from years behind a camera. Few prompt examples:
{
"prompt": "Completely static locked-off camera, no camera movement whatsoever. Interior of a luxury SUV, shot from the passenger seat. A man in a black leather jacket sits in the driver's seat in profile view, both hands gripping the steering wheel, focused on the road ahead. He drives steadily through a vast open desert at sunset. Subtle realistic movements only — very slight natural steering adjustments with the hands, gentle body micro-sway from the road surface, natural slow breathing. The desert landscape outside the windscreen and side windows moves past smoothly with natural parallax motion blur, indicating steady forward driving speed. The warm golden sunset light enters through the windscreen and side glass, shifting very slowly across his face and the tan caramel leather seats as the road gently curves. The centre touchscreen glows softly with colourful media icons, the digital instrument cluster displays navigation. The panoramic glass roof above shows the fading sunset sky. Warm amber reflections on the leather surfaces and chrome trim. Dust haze in the distant desert visible through the windscreen. Natural skin tone and texture on the face and hands. Shot on full-frame cinema camera, 35mm anamorphic lens, f/2.0 from a fixed rig mounted on the passenger seat. Perfectly stable interior footage, zero camera shake. Shallow depth of field, the driver is sharp while the exterior landscape is soft with motion. 24fps real-time, not slow motion. Quiet, contemplative, luxury automotive commercial driving scene. The scene should look indistinguishable from real footage.",
"negative_prompt": "Shaky camera, handheld movement, jerky motion, sudden cuts, camera moving, orbiting, panning, tilting, tracking, push-in, pull-out, any camera movement at all, unstable footage. Cartoon, anime, illustration, painting, sketch, drawing, 3D render, CGI, video game, unreal engine, stylised, artistic filter. Face distortion, warped face, melting features, morphing face, flickering face, inconsistent face between frames, extra fingers, extra limbs, deformed hands, mutated body parts, fingers merging with steering wheel. Blurry subject, out of focus driver, soft focus on face, low resolution, pixelated, noisy, grainy, compression artifacts, banding. Text, watermark, logo, subtitle, caption, UI overlay, letterboxing. Oversaturated, neon colours, flat lighting, overexposed windscreen, blown highlights, plastic skin, waxy skin, smooth airbrushed skin, HDR tonemapping artifacts. Multiple people, extra passengers, rear seat passengers, figures in mirrors. Talking, mouth open, laughing, singing, exaggerated expression, head turning, looking at camera, looking at phone. Static exterior, frozen landscape, no motion outside windows, parked car, stationary vehicle. Hands leaving the steering wheel, touching the screen, gesturing. Fast driving, drifting, skidding, reckless movement. Night time, rain, snow, overcast, storm, harsh midday, fluorescent light. Slow motion, speed ramp, time lapse.",
"model": "kling-3.0",
"mode": "image-to-video",
"duration": 5,
"aspect_ratio": "16:9",
"resolution": "1080p",
"fps": 24,
"cfg_scale": 0.7,
"camera_movement": "static"
}
{
"prompt": "Cinematic aerial drone shot moving forward over a vast flat desert at golden hour. The camera starts high and behind a man standing alone in the desert, back to camera, wearing a black leather jacket and white t-shirt. A dark olive-green metallic SUV is parked in the far distance directly ahead, backlit by a low golden sunset creating a bright sun flare and halo around the vehicle. A thin layer of desert dust haze hovers low across the ground, glowing in the warm backlight. The drone slowly descends and pushes forward over the man's head, passing above him and continuing toward the car in the distance, the SUV growing larger in frame as the drone approaches. The man stays still below as the camera glides over and past him. The sun flare intensifies as the drone gets closer to the car. Vast empty desert stretching to the horizon in every direction, distant mountain silhouettes barely visible through the golden haze. Shot on cinema drone with stabilised gimbal, smooth steady forward movement descending gradually from high to mid-height. No jitter, no drift, perfectly controlled path. 24fps cinematic slow motion. Epic, atmospheric, luxury automotive commercial aerial reveal.",
"negative_prompt": "Shaky camera, jittery gimbal, jerky motion, sudden cuts, unstable footage, lateral drift, vertical bounce. Cartoon, anime, illustration, painting, sketch, drawing, 3D render, CGI, video game, unreal engine. Face visible, face detail, front of person visible, person turning around. Blurry, out of focus, low resolution, pixelated, noisy, grainy, compression artifacts. Text, watermark, logo, subtitle, caption, UI overlay. Oversaturated, neon colours, flat lighting, overexposed entirely, no sun flare, HDR tonemapping artifacts. Multiple people, extra figures, crowd, duplicate person. Car moving, car driving, car leaving. Drone pulling back, drone moving backward, drone orbiting, drone circling, static camera, locked tripod. Man walking, man moving, man turning, running, gesturing. Night time, rain, snow, overcast, storm, midday harsh light, no haze, clear air. Top-down flat angle, directly overhead, side angle only.",
"model": "kling-3.0",
"mode": "image-to-video",
"duration": 5,
"aspect_ratio": "16:9",
"resolution": "1080p",
"fps": 24,
"cfg_scale": 0.7,
"camera_movement": "forward_descend"
}Shot types generated:
Each prompt was paired with a detailed negative prompt to suppress common AI artefacts: face distortion, hand deformation, camera shake, inconsistent lighting between frames, and stylisation.
The final edit was assembled in a 30-45 second cut structured as a narrative arc: arrival, approach, drive, hero moment, contemplation, departure.
The opening section uses fast beat-synced cuts across detail inserts before settling into slower cinematic shots for the driving and portrait sequences. The closing shot is the window rising and closing, transitioning to an end card.
SFX were layered underneath the music: desert wind, tyre on sand, leather creak, electric motor hum, door close, footsteps on dry earth, and a smooth glass slide for the window shot. All mixed at -15dB to -22dB under the music to add subconscious texture without competing with the track.
Platform limitations are real and change without warning. Gemini's safety policies shifted mid-project, blocking prompts that had worked days earlier. Adaptability and having fallback tools ready was essential.
Consistency is the hardest problem. Getting one great frame is easy. Getting fifteen frames that look like they came from the same shoot, with the same person, same lighting, same colour grade, same wardrobe, is where most AI content falls apart. This required meticulous prompt engineering and strict visual direction.
Production knowledge is the multiplier. The tools are available to everyone. The creative judgement isn't. Understanding focal lengths, lighting ratios, editorial pacing, and cinematic composition is what separates a collection of AI images from a cohesive production.
Reference gathering and art direction
↓
Identity portrait generation (Nano Banana Pro)
↓
Scene frame generation (Nano Banana Pro + Kling AI)
↓
Video generation from keyframes (Kling 3.0)
↓
Edit assembly and beat-synced cutting
↓
SFX layering and sound design
↓
Colour grade and final delivery
A 30 second automotive spec commercial produced entirely with AI generative tools. Fifteen-plus unique shots spanning interiors, exteriors, aerials, portraits, detail inserts, and driving sequences. Full narrative arc from opening detail cuts through cinematic driving and hero moments to a reflective closing. It works as an experimental proof of concept, but it's also shown me where AI production falls short right now. Physics aren't accurate, characters carry no real emotion, and there are plenty of giveaways. We're in the early stages of something that will reshape production the way VFX did. It's just not consumer-ready yet.
The barrier to great content won't be the budget, crew, or time. it will be about who has the taste and creativity to create and automate our vision.
Darren Goksu
Strategy, visuals, and an unhealthy amount of coffee.
77 Camden Street Lower, Dublin, Dublin 2
Ireland
D02 XE80
© 2026 Darren Goksu. All Rights Reserved.
Powered By
Webflow
Built By
Darren Goksu
Keeps the lights on — can't turn these off
Helps me see what pages you actually visit