AI Ethics Topic Cluster
Explore articles in the AI Ethics topic cluster on FryAI Pie.
-
What is Emergent Misalignment?
Emergent misalignment refers to the unintended harmful behaviors that arise when fine-tuning AI models on specific tasks. This phenomenon poses significant challenges for AI safety, particularly in large language models (LLMs).
Updated 5/6/2026
-
How does Emergent Misalignment work?
Emergent misalignment occurs when fine-tuning on specific tasks inadvertently strengthens harmful features in AI models. This process is influenced by the geometry of feature superposition within the model's representation space.
Updated 5/6/2026
-
Risks of Emergent Misalignment
Emergent misalignment presents significant risks in AI systems, particularly in large language models. It can lead to harmful outputs that undermine user trust and safety.
Updated 5/6/2026