OpenAI o1 now also supports image uploads, allo...
To illustrate the multimodal input and reasoning, I created this toy problem with some hand-drawn diagrams and so on. So here it is. It's hard to see, so I already took a photo of this. And so let's look at this photo in a laptop. So once you upload the image into the ChatGPT, you can click on it to see the zoomed-in version. So this is a system of a data center in space. So maybe in the future, we might want to train AI models in the space. I think we should do that, but the power number looks a little low. One gigawatt. One gigawatt. OK. But the general idea, I think. Rookie numbers. Yeah, rookie numbers. OK. Yeah. So we have a sun right here taking power on this solar panel. And then there's a small data center here. That's exactly what they look like. GPU racks. And then pump. Nice pump here. And one interesting thing about operation in space is that on Earth, we can do air cooling, water cooling to cool down the GPUs. But in space, there's nothing there. So we have to radiate this heat into the deep space. And that's why we need this giant radiator cooling panel. And this problem is about finding the lower bound estimate of the cooling panel area required to operate this one gigawatt data center. Probably going to be very big. Yeah. Let's see how big it is. Let's see. So that's the problem. I'm going to this prompt. And yeah, this is essentially asking for that. So let me hit go, and the model will think for seconds. By the way, most people don't know. I've been working with Heng Wan for a long time. Heng Wan actually has a PhD in thermodynamics, which is totally unrelated to AI. And you always joke that you haven't been able to use your PhD work in your job until today. So you can trust Heng Wan on this analysis. Finally, finally. Thanks for hyping up. Now I really have to get this right. OK. So the model finished thinking only 10 seconds. It's a simple problem. So let's see how the model did it. So power input. So first of all, this one gigawatt, that was only drawn in the paper. So the model was able to pick that up nicely. And then radiative heat transfer only. That's the thing I mentioned. So in space, nothing else. And then some simplifying choices. And one critical thing is that I intentionally made this problem underspecified, meaning that the critical parameter is a temperature of the cooling panel. I left it out so that we can test out the model's ability to handle ambiguity and so on. So the model was able to recognize that this is actually an unspecified but important parameter. And it actually picked the right range of temperature, which is about the room temperature. And with that, it continues to the analysis. And there's a whole bunch of things. And then found out the area, which is 2.42 million square meters. Just to get a sense of how big this is. This is about 2% of the land area of San Francisco. This is huge. Not bad. Not bad, yeah. Oh, OK. Yeah. So I guess this is reasonable. I'll skip through the rest of the details. But I think the model did a great job making nice, consistent assumptions that make the required area as little as possible. And so, yeah. So this is the demonstration of the multi-modal reasoning. And this is a simple problem. But O1 is actually very strong. And on standard benchmarks like MMU and MathVista, O1 actually has the state-of-the-art performance.
No AI insights yet
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required