Exploration #98

Is AI Video Ready for Public Media?

Bots help a videographer document bison on the Great Plains..

Image Created with DALL-E 3

Hate Longscrolling? Jump Around!

Hi all. This week our main focus is on generative video. But we’ll also touch on updates to Gemini and ChatGPT, as well as other news and thoughts on gAI, and the last gasp of first impressions on the Apple Vision Pro. But first…

A Look into the Future of AI at Work

Last Thursday, the Public Media Innovators PLC held its February 3rd Thursday webinar. Ethan Mollick delivered an action-packed talk exploring what, with a little practice, AI can do for you, and fielded a number of questions from attendees on using generative AI tools like ChatGPT Plus and Google Gemini Advanced. If you missed it, want to share it with your team, or just want to revisit it, you can see the video in its entirety on the PLC’s Events Page.

And while you’re on that page, you can also click the link to register for our next 3rd Thursday webinar, “Innovation at SXSW: What Public Media Stations Should Know.” The webinar will happen March 21 at 1pET/10aPT and here’s the description: South by Southwest is the premier professional development event for practitioners of emerging media, where those of us thinking about the future gather to learn and be inspired. Whether it’s paradigm shifting keynotes, or surprisingly insightful conversations while waiting in line for tacos, it’s a time and place where serendipity and kismet reign supreme. A dozen of your public media colleagues from across the nation will be attending this year’s ‘South-by.’ Join us for a distillation of the most impactful trends from SXSW 2024 with a focus on media innovation and technology (e.g., AI), and their implications for public media. We’ll discuss how these insights can inform your production, engagement, and distribution strategy at your station.

You can also click here to register.

Which Generative Art Tools Should You Try?

Also last week, the second installment dropped in the gAI User’s Guide collaboration between Public Media Innovators and Current. This month’s guide was focused entirely on generative art tools like DALL-E, Firefly, and Midjourney, and illustrates its points by comparing multiple images created in different tools while using the same prompts. It was authored by Amber Samdahl, Creative Director, and Brandon Ribordy, Environment Designer, at PBS Wisconsin, along with Kayla LaPoure, the Lead Artist at Nebraska Public Media Labs, and you can check it out here.

And if you want to revisit the January installment, which focused on chatbots, you can find that here.

Is AI Video Ready for Public Media? Sora. 

Up till last week, gAI tools for creating still imagery were really the cutting edge for any type of image generation. Generative Video has been intriguing, but also very much experimental and often disturbing. Thanks to products from Runway Gen-2, Pika, Stable Diffusion, and most recently Google, the year-over-year improvements in generative video are vast, but the end result is often still a little trippy and clearly AI-generated. Then OpenAI released a preview of their new tool, Sora.

I’ll collect coverage of the preview into a little Sora-pod below. But here’s the sizzle reel that left a lot of gAI-pundits speechless:

Now, we need to keep in mind that OpenAI is not likely to release anything other than the best of the best. Last spring the state of the art was “Will Smith eating spaghetti & meatballs” and “The Rock eating rocks” (both made with a different gAI tool, and both creepy AF). We don’t know how much slag was discarded by OpenAi in the process of mining these Sora gems. And certainly, if you look closely, the illusion of reality frays a little at the seams. Even on that high wide of Big Sur, the waves seem just a little off. But let’s say you only need 15-30 frames for your edit, not a lingering 300-frame shot. Then I’d argue we’re pretty close to on point. And if you are a station that doesn’t have a extensive library of b-roll, a tool like Sora can really help you level-up your production. But to be clear, it’s not just Sora. Here’s clip I generated using Runway Gen-2 on my iPhone with a prompt asking for a photorealistic, cinematic drone shot of the Great Plains at sunset.

I’ve been a licensed drone pilot since 2016 and I can tell you that, while drone tech is amazing, every time you put a drone in the air there is a non-zero chance that something catastrophic will occur. So if there is a chance that I can get usable b-roll that delivers on the brief without liability, that’s of real interest to me as a pilot/photographer.

But where Sora may truly be game changing is in the long tail of features offered. According to OpenAI’s research paper, Video Generation Models as World Simulators, Sora can: generate videos based on existing visual content (as opposed to just text); animate any input images from DALL-E 3; extend any generated video forward or backward on the timeline; make changes to the style or environment in existing videos via text commands; blend two unrelated videos into a new third video; create natural transitions between different video segments; and create photorealistic stills (seemingly on par with Midjourney, and better than DALL-E 3).

Of course the ethical considerations (should) immediately dominate the conversation. But use of a tool like Sora, governed by a policy rooted in station values, could help public media organizations create more content with the same investment in resources.

Sora is only available to red teams and a select group of video creators…for now. Is 2024 the year of gAI video? Stay tuned.

Okay, on to the links…

If You Click Only One…

How Sora Works (and What It Means) (Dan Shipper - Chain of Thought) - I’ve already said plenty about Sora, so I won’t rehash that here. But in this piece Shipper provides a good breakdown of the “how” behind Sora’s magic. And for a few additional perspectives, check out:
Movie-making magic, directed by AI (Scott Rosenberg - Axios)
OpenAI’s Video Generator Sora Is Breathtaking, Yet Terrifying (Maxwell Zeff - Gizmodo)
With Sora, OpenAI highlights the mystery and clarity of its mission (Sharon Goldman - VentureBeat)

Things to Think About…

Essence, not pixels, is the future of video (David Shapton - RedShark) - Today, we're discussing video wholly generated by AI tools. But what happens when that flexibility inevitably washes back into the field production process? What happens when the real world, captured via a sensor, goes from the end point to the starting point for a moving image? A clip of footage born of reality but not limited to reality, should the needs of the story in the edit suite evolve away from what was envisioned in the field. From Shapton's piece: "So imagine this technology fed with the raw output from a camera’s sensor. The resulting data would not be pixels but a massive collection of largely accurate assumptions the AI has made about the scene in front of the camera. It would be hard for us, as humans, to understand this data. It’s not human-readable, but it is enough to “seed” the AI to reproduce a picture that looks remarkably like the original, all without any digital artifacts, whether it’s projected at IMAX resolution or on a smartphone." This piece will excite the CTOs most, but don't let that put you off imagining where we might be with production workflows in a decade.

Strategies for an Accelerating Future (Ethan Mollick - One Useful Thing) - In this piece, from last week's webinar guest, Mollick poses four questions that we should be asking our organizations.

  1. What useful thing you do is no longer valuable?

  2. What impossible thing can you do now?

  3. What can you move to a wider market or democratize?

  4. What can you move upmarket or personalize?

These clearly draw from his background teaching entrepreneurship, but are presented through the filter of generative AI.
—Related is this piece from Rhea Purohit in Chain of Thought: How to Use ChatGPT to Set Ambitious Goals

What Will Work Like Cooking and Games If You Aren’t the New York Times? (Dick Tofel - 2nd Rough Draft) - h/t to my GM, Mark, for this one. I often reference the Times, especially when it comes to games. But this piece speaks to the larger opportunities for innovation within stations when we know our audience and create products to serve their needs. It pairs well with Ethan Mollick's piece above.

Automating ableism (s.e. smith - The Verge) - At this point, it's unusual for me to find a fresh, relevant take on issues of AI bias. But this one makes good points, and not just about AI. Here's the gut-punch: "Technologies such as these often rely on two assumptions: that many people are faking or exaggerating their disabilities, making fraud prevention critical, and that a life with disability is not a life worth living. Therefore, decisions about resource allocation and social inclusion — whether home care services, access to the workplace, or ability to reach people on social media — do not need to view disabled people as equal to nondisabled people. That attitude is reflected in the artificial intelligence tools society builds."

Things to Know About…

A High School Deepfake Nightmare (Jason Koebler & Emanuel Maiberg - 404 Media) - With all the discussion this week about generative video and imagery, it's worth a moment to remember that, yeah, deepfakes are still a huge problem. And I think we can all see where this will go with generative video if legislation isn't put in place to give victims justice.
—But, as Brian Fung reports on CNN, don't hold your breath: AI could disrupt the election. Congress is running out of time to respond

Our next-generation model: Gemini 1.5 (Sundar Pichai & Demis Hassabis - The Keyword) - Well, that was fast. Gemini 1.0 is only a few weeks old. Google’s stock price must’ve dropped. <<checks internet>> Ah...yes. Nevertheless, the big news with this update seems to be the amount of data Gemini 1.5 can process. From the blog post: "This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens." The more data it can consume as context, the more helpful it should be to you. Beyond that, Gemini is also sporting some upgraded reasoning skills to complement the enhanced input limits. My thoughts are with Claude 2.0 though. Every week Anthropic's chatbot seems to be fading further into irrelevance.
—Here is how David Pierce reported it for The Verge: Gemini 1.5 is Google’s next-gen AI model — and it’s already almost ready
—Stepping back from the future and into the present. Pierce's colleague, Emilia David reports: Gemini Advanced is most impressive when it’s working with Google

ChatGPT and Google’s Gemini will now remember your past conversations (Chris Morris - FastCompany) - While Google is gearing up for Gemini to accept more inputs, ChatGPT has been given a memory boost, allowing it to retain more context. I've run into this problem with longer prompt exchanges, and have seen similar effects with DALL-E 3. Thus far there's a sweet spot after which generating additional images just becomes an exercise in absurdity. I’d love to see that resolved.
—And Michelle Cheung, writing for Quartz, breaks down What ChatGPT getting a memory means for you

Early Adopters of Microsoft’s AI Bot Wonder if It’s Worth the Money ($) (Tom Dotan - Wall Street Journal) - To me the issues discussed here are as much an education issue as it's a product issue. What is needed is a train-the-trainer program to help show people how these tools can be useful and how to get the most utility out of them day to day.

Adobe’s Very Cautious Gambit to Inject AI Into Everything ($) (Austin Carr & Brody Ford - Bloomberg) - I will say - as a user of Adobe’s photo editing tools but also as someone working with artists who work with Adobe tools - most of the other gAI companies seem to be out to make everybody an artist, while Adobe seems to instead be developing tools that help artists become more efficient (and thus more competitive) at their craft.

I tried out an Apple Vision Pro. It frightened me (Arwa Mahdawi - The Guardian) - The first wave of Apple Vision Pro hype appears to be receding. Eventually extended reality always gives way to actual reality. Case in point: in our own demos here, we've found that the AVP does not remedy the motion sickness (also known as cyber sickness) that many people, especially women, experience with XR headsets.
—The new product review intern at Meta also tried out the AVP, and brings us a totally unbiased review based on the experience.
—But this review, from David Heaney at UpLoadVR, is actually worthy of your time: A Heavy Portable Cinema & Monitor With A Promising Spatial OS
—Meanwhile, Victoria Song reports that Apple fans are starting to return their Vision Pros 
—Lastly, Tom Ffiske over at Immersive Wire provides some interesting, follow-the-money analysis that gets behind Apple's market stats to look at actual potential use: How many people are downloading Apple Vision Pro apps?

And finally…

Meet the Pranksters Behind Goody-2, the World’s ‘Most Responsible’ AI Chatbot (Will Knight - Wired) - And finally, this is why art and artists still matter.

Have a creative, productive week!

Two bots help a filmmaker document bison on the Great Plains.

Image created with DALL-E 3

Reply

or to participate.