Rated↓ Article |
---|
Alignment faking in large language modelsA paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models anthropic.com 2,000 words Rated 2024-12-20 |
1 Matching Rating