I’ll be the primary to confess that I’m not but digging too extremely deeply into the world of generative AI video. Definitely nowhere close to the extent of Jeff Foster and all that he’s doing. However I do know that I do like the thought of AI-generated animations. There are lots of instances in a video edit the place we want a easy illustration or animation, and we simply don’t have the finances to rent an animator. I’m definitely no animator, however I used to be engaged on a bit documentary mission just lately the place the voice-over mentioned this:
… in Ohio, Texas, Illinois, Indiana, and elsewhere
We didn’t have a ton of protection for this 7-minute piece, and we definitely didn’t have the finances to animate all of it. We’d already spent cash on two separate items of animation, however I believed this might be an excellent place the place quick animation might assist illustrate what the voice-over was saying. This was a historic movie, so a historic look was what we had been going for. I personally don’t subscribe to a bunch of generative AI companies, however realizing that Google’s Veo3 mannequin was just lately added to Adobe FireFly as an Adobe subscriber, FireFly was the place I went.
This was my first immediate utilizing the Firefly video mannequin:
An outdated United States map that animates to focus on the states of Ohio, Texas, Illinois, Indiana, in that order
And that is the consequence:
Okay, not what I needed. The jibberish textual content isn’t acceptable. Since I’m not an knowledgeable immediate author it was price revising the immediate:
An outdated United States map that doesn’t comprise any textual content however the map slowly animates to focus on solely the states of Ohio, Texas, Illinois, Indiana, in that order
Whereas I did just like the model of the map I didn’t love it so I soar proper to the Veo3 mannequin in Firefly for my second try of the immediate above. That is the consequence:
That’s a bit bit higher. I actually just like the model of that map, however once more, it produced gibberish textual content on the map and didn’t spotlight the states that I discussed.
I bought much more particular on the immediate:
An outdated United States map that that’s solely the define of the nation and the states and that map doesn’t comprise any textual content in any respect. After a second sure states on the map slowly turn into highlighted on this order: Ohio, Texas, Illinois, Indiana. Solely these states are highlighted when the animation on the map ends. Nothing else in any respect. Not one of the states are named and there may be zero textual content or letters on the map.
Nonetheless utilizing Veo 3, this was the consequence:
That at the least bought me nearer with out the gibberish textual content on the map, however I really feel just like the model isn’t fairly nearly as good because the one above. However the animation and the state highlights are all off.
Subsequent, I believed, “ what, maybe the mannequin is off.” Since Firefly additionally helps the older Veo 2, I believed I might give it a attempt with the identical immediate through Veo 2. I imply … maybe an older LLM is best for a sure job, proper? 🤷♂️ This was the consequence:
That era felt prefer it was regressing backwards because the animation model is worse and I’ve bought textual content again on the map.
And since I’ve gone this far, why not soar again to the Adobe’s personal Firefly video mannequin and check out my far more detailed immediate and see what outcomes that gave me:
At this level, I felt just like the AI was simply trolling me. I gave up and located another B-roll to cowl the road of narration within the movie. What did I get for my time? The usage of practically 5000 credit in Firefly. No, I wasn’t out a lot as that’s simply a part of the package deal I’ve. However think about if this was a mission-critical video that I needed to get good. I might have simply run out of credit. I haven’t used a whole lot of the generative video instruments, however all of them want some method to generate a really fast, very low-res draft preview earlier than spending huge time credit on getting a usable piece. Maybe a few of them have simply that. As one who doesn’t do a ton of generative video, and I believed this was an attention-grabbing train within the course of.
Quick ahead a few weeks
I drafted this text after which didn’t publish it for a few weeks. Since AI advances occur shortly, I believed I might give it a attempt as soon as once more coming again a pair weeks later.
Right here is the Firefly consequence:
Adobe Firefly has a immediate enhancement characteristic, so I allowed it to reinforce the final immediate from above. That is what it enhanced the immediate to:
A static, detailed define of america map with no textual content or labels. The digicam focuses on the map, which stays unchanged for a second. Then, the animation begins, steadily highlighting the outlines of Ohio, Texas, Illinois, and Indiana in a comfortable, glowing impact. The remainder of the map stays unaltered, sustaining its clear, minimalist design. The video ends with these 4 states highlighted, creating a transparent visible focus with none extra textual content or components. The general model is clear and fashionable, emphasizing the geometric shapes and borders of the states within the model of Vector Artwork.
I like that and I can see that getting a greater consequence, however this was the improved immediate consequence:
One factor I do like about FireFly is I’m continuously reminded of what number of credit a era prices.
This little button within the nook is a useful reminder. It will possibly typically take so many generations to get a helpful consequence so you’ll use your credit. I like {that a} low-res era is comparatively low cost within the type of credit. You may improve your immediate at 540 decision after which generate the ultimate model at the next decision costing extra credit. That’s at the least one thing on the subject of the best way to spend your credit. Different generative video methods and LLMs could function otherwise so far as the price and credit score perform. However at this level, I don’t spend extra cash monthly on one other system as I simply don’t have the necessity … but.
And the way did the Veo3 mannequin do with that “enhanced” immediate a few weeks later?
I requested a buddy who’s deep into AI, generative AI, why I couldn’t get a fascinating consequence. His response is smart:
You may’t simply immediate it with a particular graphic animation, as a result of AI works by diffusing tons of of movies which can be doing the identical factor. There’s not gonna be animations of maps with states being highlighted for it to diffuse. Like if in case you have a lady and also you say “lady dancing” it appears to be like for movies of girls dancing, diffuses them into noise. And from the latent level cloud of noise it renders the densest areas of the purpose cloud as movement. Nevertheless it’s taking tons of to 1000’s of movies of girls dancing to do this. The much less reference it has the extra it hallucinates. In the event you ask it to animate a map of america it’s gonna be on the lookout for movies of maps of america to diffuse. However for those who say highlights particular states, it’s not gonna know what that’s except there are movies of that labelled as such in its library. Once you ask for one thing uncommon that it doesn’t have a whole lot of reference for it simply finds the closest factor and diffuses it. And that’s why you get bizarre stuff whenever you get too artistic. There is likely to be movies of maps and it’d know the best way to animate them as if the digicam is transferring. Nevertheless it most likely received’t know the best way to spotlight states. Generative AI is a diffusion mannequin blended with a big language mannequin, however all the big language mannequin does is search for movies with the tokens in your immediate. A token is sort of a meta tag. So for those who say map, it simply appears to be like at movies with maps. Generative AI continues to be fairly silly.
That makes a whole lot of sense contemplating how particular of an animation I used to be asking it to make. On the one hand, it form of disappoints me as a result of it is a place the place generative AI might be so helpful. However then again, I like maintaining animators and movement graphics artists employed. I simply want we all the time had the finances to make use of them.