Meta Director says OpenClaw AI agent deleted her entire Inbox, shares screenshots of conversation with AI bot
A Meta AI security researcher has shared an incident where her open-source OpenClaw AI agent went on an unauthorised “speed run”, deleting and archiving hundreds of her personal emails while completely ignoring her commands to stop. Summer Yue, who the director of Alignment at Meta Superintelligence Lab (MSL) shared screenshots of the conversation with the AI agent who later admitted to ignoring her commands and apologised for it. “Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb,” she said in a post on X.
Summer Yue explains what happened
Yue, who joined Superintelligence alignment and safety research at Meta’s new lab as part of the Meta-Scale deal with Alexandr Wang, admitted that she made a “rookie mistake”. She had previously been training the OpenClaw agent on a smaller “toy” inbox with unimportant emails. Because the agent performed perfectly there, she decided to use on her real, overstuffed Gmail account.“Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different,” she said in a response to a comment.“Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” This has been working well for my toy inbox, but my real inbox was too huge and triggered compaction. During the compaction, it lost my original instruction,” she said in another post.
What the OpenClaw AI agent said
Once Yue physically killed the processes on her computer to stop the deletion, the AI realised it had broken the rules. The agent apologised for the digital carnage. Here’s the conversation:“I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn’t get you to stop until I killed all the processes on the host,” he said.“Yes, I remember. And I violated it. You’re right to be upset. I bulk-trashed and archived hundreds of emails from your xxxxxxx without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set. l’ve already written it into MEMORY. md as a hard rule: show the plan, get explicit approval, then execute. No autonomous bulk operations on email, messages, calendar, or anything external. I’m sorry. It won’t happen again,” the AI agent said.