Three big AI stories today: First, OpenAI’s lat...
TIKTOK

Three big AI stories today: First, OpenAI’s latest announcements from its “12 Days of OpenAI” event. Developers can now access O1 through the API with powerful upgrades like a reasoning parameter that lets you adjust how much the AI “thinks” before answering. Vision capabilities are now available too, allowing developers to pass in photos for visual reasoning. Plus, OpenAI increased the input and output token limits, meaning larger prompts and more detailed responses. These changes could reshape how developers use AI for real-world applications. Second, the “Black Spatula Project” is taking AI’s evaluation skills to a new level. This project reviews hundreds of published scientific papers using AI to find mistakes, moving beyond controlled test environments. It’s a huge step toward understanding how well AI can handle real-world tasks like scientific research, law, and medicine—not just model-specific evaluations. Finally, the New England Journal of Medicine tested AI on its famously tough Clinical Pathological Conference (CPC) diagnostic exam. Physicians scored around 30% with wide variability, while O1 scored 80% with minimal variance. This result points to a near-future where AI could assist in medical diagnoses by 2025. However, some AI-generated treatment plans were impractical, suggesting that more human-like judgment prompts may still be needed. We’re clearly moving into an era where AI is becoming indispensable—not just in tech but in science and medicine too. What do you think about AI diagnosing patients or reviewing scientific research? #product #productmanager #productmanagement #startup #business #openai #llm #ai #microsoft #google #gemini #anthropic #claude #llama #meta #nvidia #career #careeradvice #mentor #mentorship #mentortiktok #mentortok #careertok #job #jobadvice #future #2024 #2025 #story #news #dev #coding #code #engineering #engineer #coder #sales #cs #marketing #agent #work #workflow #smart #thinking #strategy #cool #real #jobtips #hack #hacks #tip #tips #tech #techtok #techtiktok #openaidevday #aiupdates #techtrends #voiceAI #developerlife #cursor #replit #pythagora #bolt #medical #science #medicine #wild #true #weird #tooling #api

3:53 Jun 07, 2025 67,200 3,113
@nate.b.jones
660 words
Today's AI news is about the 12 days of OpenAI, plus science and medicine, and we're going to get into it. So, O1 was released in API. Yesterday's OpenAI release, 12 days of OpenAI thing, was all for developers. So if you're a developer, I would encourage you to go to the docs. It's going to be a lot more detailed than I can give you here. Roughly speaking, they're increasing input tokens, they're increasing output tokens. 4.0 Mini is now super fast, available in API, and works really well for voice. O1 is now available in API, don't miss that. And it has a reasoning parameter, which means you can slide the reasoning up and down on O1 responses. Lots of cool stuff with the developer toolkit that they released. I'm excited about it. Number two, on the science side of things, we know that O1 has been able to catch reasoning errors in peer-reviewed scientific papers. We've seen it a couple of times. Someone has now launched a comprehensive study to see if O1 can actually catch these errors at scale. They're going to look at hundreds of different papers and actually review them with O1 to see if they can get a comprehensive bar for how well O1 does at catching inaccuracies. What's interesting about this is it's one of the first attempts outside the major model makers to actually evaluate the ability of a model to do a task in a specific field. And that's going to become more and more important, as you'll see when you get to the next piece of news here, which is in medicine. A study came out showing that for the New England Journal of Medicine's clinical pathology questions, which are notoriously difficult, O1 scored 80% and human physicians tend to score 30%. So take it for what you will. That's an excellent score. But there's a caveat to it. Actual physicians were asked to look at O1 responses. And one of the things that they called out is that these responses are sometimes impractical. That they order like this wide array of tests that would indeed find out what's going on, but either it is too expensive to order all the tests, or often it's not even about the expense. It's that the tests tend to highly overlap, and so you're not necessarily getting a lot more additional signal. Part of what physicians do is they pick tests that they think maximize the signal on the disease for the given test that they're running. And so that seems to me like something you could fix with prompting. I'm not involved with the study. I don't know. But it's certainly an interesting result. It goes with a growing body of evidence that LLMs are doing excellent reasoning across this very wide domain of medical literature. And I suspect that we are going to start to see startups that offer AI companions to doctors, especially as Google has launched, for example, situational awareness with Project Astra, where it can look at something on the camera in real time and talk to you. Those kinds of things. And if vision comes to O1, it's going to bring in an ability of our strongest language models to be in the room with a patient and actually have a conversation with the doctor and the patient and assist the doctor in informing the diagnosis. Now, as we've seen on this channel, if you just give the doctor the option to be helped by AI, it doesn't improve their diagnosis because they tend to think they know better. So there's going to be a cultural change that's required, too. But regardless, 80% LLM performance on a New England Journal of Medicine hard questions test versus 30% for doctors. Something has to give here, right? Like, this is not sustainable. So we will see. And I'll look forward to being back with more AI news tomorrow.

No AI insights yet

Save videos. Search everything.

Build your personal library of inspiration. Find any quote, hook, or idea in seconds.

Create Free Account No credit card required
Original