web analytics

More Research Showing AI Breaking the Rules – Source: www.schneier.com

Rate this post

Source: www.schneier.com – Author: Bruce Schneier

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time­making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Here’s the paper.

Tags: , , , , ,

Posted on February 24, 2025 at 7:08 AM17 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.

Original Post URL: https://www.schneier.com/blog/archives/2025/02/more-research-showing-ai-breaking-the-rules.html

Category & Tags: Uncategorized,academic papers,AI,cheating,chess,games,LLM – Uncategorized,academic papers,AI,cheating,chess,games,LLM

Views: 3

LinkedIn
Twitter
Facebook
WhatsApp
Email

advisor pick´S post