Discussion about this post

User's avatar
Chris Percy's avatar

Excellent list. The question on AI preferences stands out to me. Everything substantial (?) I see today assumes that declared/revealed preferences tell us what we need to know (eg when LLMs terminate tasks or take right to exit conversations, which they choose when given two options to do, what they say they like).

Given how models are trained I'm not sure this tells us much about actual preferences (conditional on sentience). We need more theory work on valence mechanisms outside of LLMs and deeper thinking in general.

That said, from an AI-human interaction perspective, the difference wouldn't matter. We can just focus on what the systems do (but in that case, better to say that's what we're doing and make it orthogonal to consciousness).

Noah Garver's avatar

The patiency point needs to be sharpened. Governance difficulty tracks patiency type, not patiency as a binary. Hedonic patiency generates welfare obligations; preference patiency starts to produce rights claims; agentive patiency becomes a legal personhood problem. These aren't the same challenge and they won't respond to the same frameworks. The field needs to map that dependency structure rather than treating "it depends on moral status" as a stopping point.

On transparency: the relevant demand isn't architectural disclosure in a technical sense — it's scrutiny of designer values. What outcomes are treated as off-limits? What counts as success? What tradeoffs are being made implicitly in training objectives and model specs? These need to be legible to outside parties, not because transparency resolves questions of consciousness, but because opaque value-weighting is how unforeseen consequences get smuggled in. If we can't inspect what a system is optimizing for and why, we can't assess whether the governance frameworks we build are tracking anything real.

Please follow my page.

1 more comment...

No posts

Ready for more?