Visualising podcasts using AI

Three recent examples + YouTube's forthcoming platform play

Nov 18, 2025

Once upon a time, podcasts were a purely aural medium. Specifically, MP3 files referenced by RSS feeds.

Now, YouTube is the most popular podcast platform in the US, Spotify is actively incentivising podcasters to produce video and visuals have increasingly become de rigueur.

Many chat-based podcasts have made the transition by investing in lights, cameras, sets and some follow-the-speaker editing (something AI can now do a pretty decent job of) and some narrative-driven podcasts (e.g. The Rest Is History) have hired animators to visualise segments.

However, according to Podtrac, almost half of the top 250 US podcasts aren’t posting video to YouTube.

Generating a video of the podcast’s waveform with subtitles over a static image has long been the go-to answer for podcasters wanting visuals without the expense and complexity of filming or traditional animation.

The advent of AI video generation is starting to open up new possible routes to visualising podcasts.

At this year’s Made on YouTube event in September, YouTube announced new functionality in YouTube Studio that will enable podcasters to use Google AI video model, Veo, to generate video clips from podcast audio/transcripts. The feature is slated for release “early next year”, although will initially only be available to a select group of podcasters, ahead of a wider rollout.

Whilst we await the fruits of that development, here are three in-the-wild examples of AI podcast visualisation I’ve come across recently, which feel instructive of what’s to come.

True Crime Reports

A collaboration between Al Jazeera and Message Heard, True Crime Reports intersperses traditionally shot presenter video with AI-generated reconstructions of the events being described. It doesn’t attempt photorealism or full animation. Instead it gives the Ken Burns treatment to still images rendered in a distinctive limited colour palette graphic illustration style.

It feels like a smart stylistic choice, reminiscent of the hand-drawn illustrations crime documentaries sometimes employ when reconstructing traumatic events and more forgiving of AI anomalies than a more photorealistic style.

About to embark on a second season, episodes from the first season of True Crime Reports have clocked up over 8 million views on YouTube. The formula seems to be working.

UFO Copilot

Not an official podcast visualisation but a user-generated (is that still a term?) AI animation to accompany a 4 and a half minute segment of an episode of the MrBallen Podcast (subtitle ‘Strange, Dark & Mysterious Stories’), made by UFO researcher/content creator, Tupacabra.

Unlike True Crime Reports, UFO Copilot doesn’t just zoom and pan across static AI-generated images, but stitches together AI-generated video clips in a nostalgic 2D anime style.

Whilst the animation has plenty of glitches, it’s been lovingly crafted and I felt it enhanced, rather than detracted from, my listening experience and ultimately served to increase my immersion in the narrative.

Generating animations like this to match existing audio is currently a fairly painstaking process, although for investigative documentary podcasts - which are themselves expensive and time-consuming to produce - it may be worth exploring (and the ease of creation is only going to increase).

At the time of writing, it had received just 1,000 views on YouTube (which feels a bit unjust when you look at the slop that’s currently doing big numbers).

The Talking Baby Podcast

Despite its name, The Talking Baby Podcast is more meme than podcast (the creator, Jon Lajoie, published just three short videos in April and May). However it’s still interesting as an example of AI-generated video being lip-synced with interview audio. Whilst the intent here is comedic and the audio needs the video to be funny, I suspect we’ll ultimately see this approach adopted for more serious ends. We are currently tip-toeing our way across the uncanny valley and it feels inevitable we’ll arrive at a point where some podcasters will record audio-only shows and then generate lip-synced accompanying videos using AI avatars (although the creepiness/authenticity factor may make obviously stylised/animated avatars a better choice than photorealistic avatars, at least in the near term).

Whilst views are modest on YouTube, The Talking Baby Podcast has generated almost 16 million views on TikTok.

So what?

Whilst it won’t be suitable for all genres of podcast, AI-generated video is set to become an increasingly viable route to visualising podcast audio beyond the traditional waveform-over-a-static-image approach.

Arguably podcasting’s most influential player, YouTube, is promising to co-opt its state of the art AI video model to this end. However, we’re yet to see any output from this integration. Will it focus on animated slideshows, similar to NotebookLM’s Video Overviews, or encourage podcasters to generate full motion videos?

Right now, focusing on visualising compelling narrative segments which can then be clipped and shared on social platforms makes sense, as does adopting a distinctive graphic style rather than attempting photorealism.

As is invariably the case with AI media generation, the best results will come from those with an understanding (whether taught or self-taught) of how to tell a compelling story in that medium and how to wrangle the tools to realise that.

We’re also going to see a lot of bobbins. So, stand by for that.

Digital First Content

1dEdited

Dan, a nice article (and I think True Crime Reports is excellent use of AI).

I also think that broadcasters will increasing look to visual podcasts to replace cheaper daytime TV (as well as benefit from the reach on audio and YouTube). So we will see production values of visual podcasts increase to make it work across all platforms. And AI will help improve production values at an economic price.

We have just finished a brand funded project with C4 in the UK that is beginning to venture in to this territory. Mainly traditional shooting, but augmented with AI to enhance bits of the production. And we see this becoming a trend in 2026.

https://www.channel4.com/press/news/vogue-williams-and-danny-campbell-host-filtered-interiors-new-branded-vodcast-channel-4

Expand full comment

Jake Warren

Hey Dan, Jake here from Message Heard. Thank you for the kind words on your latest post re ‘True Crime Reports’, it is something we are really investing in at MH, how do you solve the problem of narrative visualised storytelling natively for podcasting that isn’t just studio talking heads or making a documentary/youtube vid/TikTok and calling it a podcast? We think we have the answer!

Dan’s Media & AI Sandwich

Discussion about this post