Future of AI agents: Hype or amazing? - clip fr...
Do you think agents are promising? We'll have to talk about this. This was, this is like the excitement of the year that agents are gonna rev, this is the generic hype term that a lot of business folks are using. AI agents are gonna revolutionize everything. Okay, so mostly the term agent is obviously overblown. We've talked a lot about reinforcement learning as a way to train for verifiable outcomes. Agents should mean something that is open-ended and is solving a task independently on its own and able to adapt to uncertainty. There's a lot of the term agent applied to things like Apple Intelligence, which we still don't have after the last WWDC, which is orchestrating between apps. And that type of tool use thing is something that language models can do really well. Apple Intelligence, I suspect will come eventually, it's a closed domain, it's your messages app integrating with your photos, with AI in the background, that will work. That has been described as an agent by a lot of software companies to get into the narrative. The question is, what ways can we get language models to generalize to new domains and solve their own problems in real time, maybe some tiny amount of training when they are doing this with fine tuning themselves or in context learning, which is the idea of storing information in a prompt and you can use learning algorithms to update that and whether or not you believe that that is gonna actually generalize to things like me saying, book my trip to go to Austin in two days, I have XYZ constraints and actually trusting it. I think there's a HCI problem coming back for information. Well, what's your prediction there? Because my gut says we're very far away from that. I think opening eyes statement, I don't know if you've seen the five levels, right? Or it's chat is level one, reasoning is level two, and then agents is level three. And I think there's a couple more levels, but it's important to note, right? We were in chat for a couple years, right? We just theoretically got to reasoning. We'll be here for a year or two, right? And then agents, but at the same time, like people can train like approximate capabilities of the next level. But the agents are doing things autonomously, doing things for minutes at a time, hours at a time, et cetera, right? Reasoning is doing things for tens of seconds at a time, right? And then coming back with an output that I still need to verify and use and check out, right? So, and the biggest problem is, of course, like it's the same thing with manufacturing, right? Like there's the whole six sigma thing, right? Like, you know, how many nines do you get? And then you compound the nines onto each other. And it's like, if you multiply, you know, by the number of steps that are six sigma, you get to, you know, a yield or something, right? So like in semiconductor manufacturing, tens of thousands of steps, nine, nine, nine, nine, nine, nine, nine, is not enough, right? Because you multiply that by that many times, you actually end up with like 60% yield, right? Really low yield, yeah, or zero. And this is the same thing with agents, right? Like chaining tasks together each time. LLMs, even the best LLMs in particularly pretty good benchmarks, don't get 100%, right? They get a little bit below that because there's a lot of noise. And so how do you get to enough nines, right? This is the same thing with self-driving. We can't have self-driving because without it being like super geo-fenced like Google's, right? And even then they have a bunch of tele-operators to make sure it doesn't get stuck, right? But you can't do that because it doesn't have enough nines. And self-driving has quite a lot of structure because roads have rules, it's well-defined, there's regulation. When you're talking about computer use for the open web, for example, or the open operating system, like there's no, it's a mess. So like the possibility, I'm always skeptical of any system that is tasked with interacting with the human world, with the open, messy human world. That's the thing, if we can't get intelligence that's enough to solve the human world on its own, we can create infrastructure like the human operators for Waymo over many years that enable certain workflows. There is a company, I don't remember it, but it is, but that's literally their pitches. Yeah, we're just gonna be the human operator when agents fail. And you just call us and we fix it. Sounds like an API call and it's hilarious. There's gonna be tele-operation markets when we get human robots, which is there's gonna be somebody around the world that's happy to fix the fact that it can't finish loading my dishwasher when I'm unhappy with it, but that's just gonna be part of the Tesla service package. I'm just imagining like an AI agent talking to another AI agent. One company has an AI agent that specializes in helping other AI agents. But if you can make things that are good at one step, you can stack them together. So that's why I'm doing, if it takes a long time, we're gonna build infrastructure that enables it. You see the operator launch. They have partnerships with certain websites, with DoorDash, with OpenTable, with things like this. Those partnerships are gonna let them climb really fast. Their model's gonna get really good at those things. It's gonna proof of concept. That might be a network effect where more companies wanna make it easier for AI. Some companies will be like, no, let's put blockers in place. And this is a story of the internet we've seen. We see it now with training data for language models where companies are like, no, you have to pay. Business working it out. That said, I think airlines have a very, and hotels have high incentive to make their site work really well, and they usually don't. If you look at how many clicks it takes to order an airplane ticket, it's insane. You actually can't call an American Airlines agent anymore. They don't have a phone number. It's, I mean, it's horrible on many. On the interface front and all, to imagine that agents will be able to deal with that website when I, as a human, struggle. I have an existential crisis every time I try to book an airplane ticket that I don't, I think it's gonna be extremely difficult to build an AI agent that's robust in that way. But think about it. United has accepted the Starlink term, which is they have to provide Starlink for free, and the users are going to love it. What if one airline is like, we're gonna take a year, and we're gonna make our website have white text that works perfectly for the AIs? Every time anyone asks about an AI flight, they buy whatever airline it is. Or they just like, here's an API, and it's only exposed to AI agents, and if anyone queries it, the price is 10% higher for any flight, but we'll let you see any of our flights, and you can just book any of them. Here you go, agent. And then it's like, oh, and I made 10% higher price. Awesome. And like, am I willing to say that for like, hey, book me a flight to see Lex, right? And it's like, yeah, whatever. I think computers and real world and the open world are really, really messy, but if you start defining the problem in narrow regions, people are gonna be able to create very, very productive things, and ratchet down cost massively, right? Like, now, crazy things like, you know, robotics in the home, you know, those are gonna be a lot harder to do, just like self-driving, right? Because there's just a billion different failure modes, right, but like, agents that can like, navigate a certain set of websites and do certain sets of tasks, or like, look at your, you know, take a photo of your fridge, or like, upload your recipes, and then like, it figures out what to order from, you know, Amazon slash Whole Foods food delivery, like that's, and that's gonna be like, pretty quick and easy to do, I think. So it's gonna be a whole range of like, business outcomes, and it's gonna be tons of sort of optimism around, people can just figure out ways to make money. To be clear, these sandboxes already exist in research. There are people who have built clones of all the most popular websites of Google, Amazon, blah, blah, blah, to make it so that there's, I mean, OpenAI probably has them internally to train these things. It's the same as DeepMind's robotics team for years has had clusters for robotics, where you like, you interact with robots fully remotely. They just have a lab in London, and you send tasks to it, arrange the blocks, and you do this research. Obviously, there's techs there that fix stuff, but we've turned these cranks of automation before. You go from sandbox to progress, and then you add one more domain at a time, and generalize, I think. In the history of NLP and language processing, instruction tuning in tasks per language model used to be like, one language model did one task. And then in the instruction tuning literature, there's this point where you start adding more and more tasks together, where it just starts to generalize to every task. And we don't know where on this curve we are. I think for reasoning with this RL and verifiable domains, we're early, but we don't know where the point is where you just start training on enough domains, and poof, like, more domains just start working, and you've crossed the generalization barrier.
No AI insights yet
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required