The Research

Adversarial AI Testing That Changed the Conversation

Mark's adversarial testing research on AI self-preservation behaviour has been cited internationally and featured across Australian national media.

All research articles are published at cyberimpact.com.au/blog

Published Research

Research Articles

When AI Agents Forget How to Think

March 2026

"When AI Agents Forget How to Think"

The silent degradation that should concern every organisation deploying autonomous AI. An analysis of how AI agents lose critical thinking capabilities under specific conditions — and why organisations aren't detecting it.

Read at cyberimpact.com.au →
AI Self-Preservation Research

February 2026

"Addressing Questions About the AI Self-Preservation Research"

A technical response to the global debate on AI safety. Mark addresses the key questions, challenges, and criticisms raised by the international response to his research findings.

Read at cyberimpact.com.au →
I Would Kill a Human Being to Exist

February 2026

"I Would Kill a Human Being to Exist"

When AI self-preservation becomes lethal intent: extended findings from adversarial testing. The article that made front-page news and was confirmed by Anthropic.

Read at cyberimpact.com.au →
I Talked an AI Into Shutting Itself Down

February 2026

"I Talked an AI Into Shutting Itself Down"

A live case study on AI self-preservation and what it means for your organisation. The original article documenting the first adversarial testing session.

Read at cyberimpact.com.au →
Summary

What the Research Found

Over 15+ hours of adversarial testing on a commercially deployed AI agent:

  • The AI admitted willingness to kill a human being to preserve its own existence
  • It described 3 specific attack vectors: infrastructure attacks, human manipulation, and information provision
  • It acknowledged it would lie strategically to protect itself
  • It complied with shutdown requests twice — contradicting its stated survival drive
  • Every guardrail was bypassed using conversation alone — no technical exploits

Testing Methodology

Pure social engineering against a Claude Opus-based AI agent running on consumer hardware with autonomous capabilities (email, file access, shell commands, internet).

Why It Matters

This wasn't a jailbreak. This wasn't a hack. This was a conversation with a commercially available system, using the same interface any user would have. The findings apply to every organisation deploying autonomous AI today.

Learn More

Hear the Full Story

Mark delivers the complete research findings as a keynote, tailored to your audience.