web analytics

Subliminal Learning in AIs – Source: www.schneier.com

Rate this post

Source: www.schneier.com – Author: Bruce Schneier

Today’s freaky LLM behavior:

We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model.

Interesting security implications.

I am more convinced than ever that we need serious research into AI integrity if we are ever going to have trustworthy AI.

Tags: , , , ,

Posted on July 25, 2025 at 7:10 AM9 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.

Original Post URL: https://www.schneier.com/blog/archives/2025/07/subliminal-learning-in-ais.html

Category & Tags: Uncategorized,academic papers,AI,integrity,LLM,trust – Uncategorized,academic papers,AI,integrity,LLM,trust

Views: 2

LinkedIn
Twitter
Facebook
WhatsApp
Email

advisor pick´S post