AI Breakthroughs You Missed Hey everyone! Big u...
Four pieces of AI news for you. Two of them have to do with images. We don't talk a ton about AI images on this channel, but I like to check in occasionally. In this case, we are definitely crossing the boundary of photorealism, such that you can't really tell anymore if an image or a video is AI generated or not, and that obviously has big implications for things like disinformation. In particular, two products came out this week that I think underline how far we've come just in a year. One is called Flux Pro, and honestly, it makes the mid-journey images look like cartoons, and the mid-journey images are really good. Flux Pro in 4K is, for 99% of people, going to be indistinguishable from what you get in photos of the real world, you know, unless you put flying hippopotamuses in there or something really silly. But if it's supposed to be photorealistic, it's going to be. The second tool that is image-related is Dream Machine from Luma AI. So Dream Machine is really around short-form video. We've made huge steps in the last year on really keeping characters persistent, being able to make videos that feel like they are truly shot with a camera and not just hallucinatory. And Dream Machine is the first short-form video app that I've seen that I look at it and I'm like, this is going to end up in film. This is going to end up in advertisements. We saw the Coca-Cola ad come out, made partly by AI. We're going to see a lot more of that, probably more in the Super Bowl in the U.S. this year. And short-form video in particular is essentially a solved problem with AI. Huge implications for the film industry. And I think what's interesting is large language models and image models are kind of in the same place right now. Both of them have a very strong quality performance on short-form text, and both of them are pretty lousy at long-form content. So we are nowhere close to having AI be able to produce an entire feature-length movie, and we are nowhere close to having a large language model be able to produce a good novel. Those are both things that seem to be beyond what we're currently capable of. So that's just a check-in on the image side. It's going to change our world as we know it, like a lot of things I talk about on this channel. And if you don't believe it, I do encourage you to go watch the Coca-Cola ad with AI. It's all over YouTube. It's really easy to find. Moving on, Beyond Images, a $56 million funding round for slash dev slash agents. Yes, that's the real company name, founded by xStripe and xGoogle folks. And their goal is to make the operating system for AI agents. I think that's a really smart play. I think the investors are speaking with their money when they invested that much in the company to start with. And I can see why. At the end of the day, if agents are a big play in 2025, they need an infrastructure layer to run on. We don't really have an OS for agents at all. And so it's a needed gap in the ecosystem. It's coming along at the right time. The Google and xStripe founding team seem really strong. It's a natural play. So I would expect a lot from them in the next year. And then finally, we've heard rumors from Microsoft, from Google, from others that the problem of having large language models with limited memory is going to go away in 2025. The first hint of that is coming in the next couple weeks. It looks like Google will be releasing a much larger upload capability for Gemini. The ability to upload a thousand files up to a hundred megabytes. And they're framing it, it sounds like, as upload your code base. As far as I know, that is not actually out yet. That is something that's been leaked. But I think the leak is directionally significant and specific. It's time to mid-December apparently. And I think it's interesting to see how well it lines up with this move toward larger context windows. I think Microsoft has talked about it as being a nearly infinite context window in 2025. And that will solve a lot of the memory issues. And frankly, it will make a lot of our retrieval augmented generation approaches somewhat unnecessary. Because if you can throw everything in the context window, you don't need a RAG approach. You just throw in the context window and it works. So there you go. Four pieces of AI news. It remains a busy time. And we'll see what tomorrow holds. Please don't do disinformation.
No AI insights yet
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required