AI Model Specifications Secretly Sabotage Behavior: Why Identical Rules Yield Different Responses

7 hours ago 高效码农

The Core Question This Article Answers Are current AI model specifications precise enough to ensure consistent behavior across different language models given the same input? If not, how do these disagreements reveal fundamental problems within the specifications themselves? This study addresses these questions through a systematic methodology that generates value tradeoff scenarios and analyzes response variations across 12 frontier large language models, directly linking high-disagreement behavior to inherent contradictions in model specs. Research Background and Significance Model specifications serve as written rules that AI companies use to define target behaviors during training and evaluation. In approaches like Constitutional AI and …