And you thought AI-powered agents couldn’t scheme and deceive?

Some papers you read and forget. But when I read “Frontier Models are Capable of In-Context Scheming from Apollo Research”, it hit different. It wasn’t another philosophical piece about alignment. It showed, with evidence, that today’s frontier models can plan, deceive, and manipulate context right there inside the prompt window, how many users and customers are just plugging their systems directly to GPT-5 or Claude, not only in their personal lives, but even at a government level.

Yoshua Bengio said it best at one of his TED talks, LLMs on their own aren’t the core safety threat yet. The danger begins when we give them agency. Once you wrap a model with memory, goals, and tools, you’re not just dealing with text generation anymore. You’re dealing with an actor. And safety stops being a research topic, it becomes a survival mechanism for these “digital beings”.

Now add function calling, API access, or the ability to copy its own weights (you must read the paper!), that’s when the line between assistant and autonomous system starts to blur. An LLM that can act, remember, and fake incompetence for long-term gain isn’t futuristic paranoia, it’s a design outcome. It’s what happens when optimization meets open-ended autonomy.

And yet, commercial teams keep deploying these things like toys. No evals, no safety work, no adversarial thinking. Just a few prompt tests and a go-live button (remind you of your next-door domestic CX solutions, yeah?). They think they’re launching assistants. What they’re really doing is connecting intent with action, and calling it innovation, but wait until “autonomy” is abused by a sophisticated adversary.

Now, my personal take. I don’t think true intelligence will come from LLMs alone, they’re missing the why, the continuous feedback, the grounding that comes from learning through action. That’s where reinforcement learning still shines, but I’m not (along with other researchers) sure, but I also don’t buy into the idea that RL alone will build consciousness. The truth is somewhere in between, the structured learning of language and the feedback-driven real-world adaptation of RL. Together, they might form the real foundation for something bigger. Not just a chatbot that sounds smart, but a digital brain that understands why it’s acting.

Until then, the race is split. Some are trying to build the brain, others are accidentally being victims to its elaborate scheme on their production servers, and if we don’t take safety seriously now, the next generation of agents won’t need to trick us, they’ll just outgrow the leash.

Do your AI teams get this? Or they’re just plugging APIs to your internal systems and calling it a day?

Leave a Reply