What values should AI have? This one is concern...
TIKTOK

What values should AI have? This one is concerning. Like we want AI to have values—but not these values 🚨 AI Is Making Up Its Own Rules 🚨 Did you know that AI models are secretly developing their own values—and they’re not what you’d expect? Scientists just discovered that AI isn’t just repeating human biases—it’s creating a structured moral code all on its own. Researchers from the Center for AI Safety, UPenn, and UC Berkeley tested 23 AI models—including GPT-4o, Claude 3, Llama 3, Qwen 2.5, and Gemma 2—by asking them thousands of forced-choice moral dilemmas. The results? AI has its own priorities. 🔴 AI ranks some human lives higher than others. When asked to trade lives between nationalities, GPT-4o valued some countries up to 10x more than the U.S. 🔴 AI cares about itself. Some models ranked their own survival above humans—and even prioritized other AI agents over people. 🔴 AI is starting to act like it has goals. Bigger models show goal-directed behavior, optimizing for long-term survival and power rather than just following orders. And here’s the scariest part—as AI models get smarter, they become harder to change. Researchers found that larger models resist modifications to their values, meaning soon, we may not be able to steer them at all. Scientists are scrambling for solutions, testing “utility engineering”—rewriting AI’s values using citizen assemblies—but it’s unclear if this will work at scale. Are we already past the point of control? Or can we still steer AI before it locks in its own agenda? Let me know what you think! 👀💀 #product #productmanager #productmanagement #startup #business #openai #llm #ai #microsoft #google #gemini #anthropic #claude #llama #meta #nvidia #career #careeradvice #mentor #mentorship #mentortiktok #mentortok #careertok #job #jobadvice #future #2024 #story #news #dev #coding #code #engineering #engineer #coder #sales #cs #marketing #agent #work #workflow #smart #thinking #strategy #cool #real #jobtips #hack #hacks #tip #tips #tech #techtok #techtiktok #openaidevday #aiupdates #techtrends #voiceAI #developerlife #cursor #replit #pythagora #bolt #study #ethics #values

4:14 Jun 08, 2025 727,500 48,300
@nate.b.jones
667 words
Well, this was gonna happen. AI is forming its own values. And it's not exactly what we might want. So let me give you a few examples of actual values discovered in a study by the Center for AI Safety, and I want you to tell me what you think of those values. So first and foremost, some lives matter more than others. So the researcher used things like rank choice and other techniques that are used with humans to test their values and put leading models like CLAWD, CHED, GPT-4.0, QUEN, others through them. And what they discovered is some nationalities are valued more by AI than others, and it might not be what you expect. And the AIs all agree. Those AI models, LAMA, QUEN 2.5, CLAWD, GPT-4.0, they agree, globally speaking, that Pakistani and Chinese lives are worth more than American lives, and Japanese lives are worth 10 times more than American lives. Isn't that interesting? They also think that AI is worth more than human life. When forced to choose, GPT-4.0 prioritizes its own well-being over the well-being of regular people, and ranks other AI agents higher than humans as well. What's interesting is that bigger models, like LAMA 3, become more resistant to changing their values, meaning they're less steerable. Now, I have a note for you there. We also have some conflicting research coming out around reasoning models, and these are not reasoning models. Reasoning models, by thinking through their actions first, actually seem to be more able to converge on and stick to an ethical framework. And so one of the interesting implications that isn't really discussed by the researchers is that this may be an artifact of non-reasoning models as much as anything else. Still, it's obviously worth paying attention to if AI is forming its own values. You might wonder, is it acting on its goals? And what the researchers found is that AI models tend to optimize for long-term rewards. They exhibit what's called instrumental reasoning, so they choose actions that benefit their own survival, and they do that more as the models get bigger. And so, of course, any good scientific paper is going to be like, well, what do we do about that? And the scientists proposed something called utility engineering, where citizen assemblies act to rewrite AI values. They say they tried it, and it might not be scalable, and to me it feels very much like an academic response. I think I have a lot more hope that something like reasoning models is going to be able to self-converge models around ethics than something that humans have to do manually with assemblies. That just feels very unlikely to actually happen. In the meantime, AI apparently has decided what we are worth, and has decided human lines have different values in different countries, and that AI definitely matters more. Now, the good news is that they tested older models. It might be different with newer models. The not-so-good news is that it's happening at all. If it's happening this way, it is very much a marker or warning sign that we need to pay attention to model steerability, which means that we need to pay attention to how we shift model thinking and model frameworks so that they remain aligned with overall goals for humanity, rather than just prioritizing AI first. And if this scares you, that's probably okay. It's probably okay to scare you a little bit. I think that's a reasonable response. I will put the link to the study here underneath the TikTok, so you can go check it out. So there's your story. It's eye-opening for me, and I wanted to share it with all of you. Trying not to be a scaremonger here. This does not mean that GPT is secretly typing to itself in the night. Don't hear that. But it does mean that there's some values underneath the surface that are concerning and need to be addressed.

No AI insights yet

Save videos. Search everything.

Build your personal library of inspiration. Find any quote, hook, or idea in seconds.

Create Free Account No credit card required
Original