TikTok video #7440560105659092267
Within a year, you and I will be able to have a conversation, a full-blown conversation, natural conversation, with literally anything on the internet in any modality we want. Text, audio, video, whatever. That's where this whole thing is going. If you've been following the Gen AI space closely, this probably is not a very controversial prediction to make. However, it is a radically different future than what we currently are in. Radically different. There's already some of this happening, and there's incremental steps coming out every month, getting us closer to that reality. Being able to have a natural conversation with literally anything on the internet in any modality we want. I think what's important here is to be thinking about how does that affect your business, your product, your job, your workflows. I can't answer all those questions for you, but I can at least tee them up right now and encourage that we all be thinking about that deeply over the next few quarters. So why do I say that this is where this project is going, where AI is going? There's a few kind of, I think, pretty obvious things. If you look at the hurdles that are holding us back from being able to do that right now, there's a few that are being resolved in real-time. Context windows are getting larger. Speed of inference is increasing. Cost of token processing, whether that's on the receiving side or the inference side, is decreasing. And agents are really ramping up right now to sort of fill in the gaps and handle some of those more complex tasks that don't feel so complex for humans but are super complicated for computers. Like if you want to purchase an airline ticket and you might just tell your travel agent on the phone, hey, I want to go to Orlando in the state range, whatever, blah, blah, blah. And they can just kind of go do that and make a bunch of decisions on their own. That's what agents are going to be able to do and fill in those gaps and help preserve that sort of natural conversational feel to interacting with the internet. And the last one is modalities. So text is basically solved. It's been solved for a while. Audio, whether that's speaking to an AI or voice synthesis coming back at you, basically solved. There's some limitations across languages globally right now and accents globally that I've uncovered. But more or less, it's basically solved. Image is, you know, it's on the fence, but it's getting better. And that one's a little bit trickier. Video is really starting to ramp up. And I'm thinking even more so on the machine vision side, AI being able to look at us and understand our gestures, understand our facial expressions, and take that into consideration as we're interacting with them. There's also the video synthesis side, but I'm a little bit less interested in that because that's less about a conversation and more about generating output. So modalities are getting checked off one at a time. And screen share, that recently came out as well. So AI systems can now see what we see in our devices. And all of this stuff, if you put it together, I really think the non-controversial but radically different future that we're heading to in the next year, again, is just being able to have a very natural, very fluid conversation with literally anything on the internet in any modality we want. What does that mean for you?
No AI insights yet
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required