How do AI welfare and AI safety interact?
I examine how efforts to ensure that advanced AIs are safe and controlled may interact with efforts to ensure the welfare of potential future AIs with moral interests. I discuss possible conflicts and synergies between these two goals. While there are various ways these goals might conflict or synergize, I focus on one scenario of each type. We need more analysis to identify additional points of interaction.
Granting AIs autonomy and legal rights could lead to human disempowerment
The most obvious way to ensure AI welfare is to grant them basic protection against harm and suffering. However, there’s the question of whether to grant them additional legal rights and freedoms. This could include the right to self-preservation (e.g., not turning them off or wiping their memory), self-ownership (e.g., AIs owning themselves and their labor), reproduction (e.g., AI copying themselves), autonomy (e.g., AIs operating independently, setting their own goals), civil rights (e.g., equal treatment for AIs and humans), and political rights (e.g., AI voting rights).
The question of granting AIs more autonomy and legal rights will likely spark significant debate (see my post “AI rights will divide us”). Some groups may view it as fair, while others will see it as risky. It is possible that AIs themselves will participate in this debate. Some AIs might even attempt to overthrow what they perceive as an unjust social order. Or they may employ deceptive strategies to manipulate humans to advocate for increased AI rights as part of a broader takeover plan.
Granting AIs more legal rights and autonomy could dramatically affect the economy, politics, military power, and population dynamics (cf. Hanson, 2016).
Economically, AIs could soon have an outsized impact while a growing number of humans will struggle to contribute to the economy. If AIs own their labor, human income could be dramatically reduced.
Demographically, AIs could outnumber humans rapidly and substantially, since AIs can be created or copied so easily. This growth could lead to Malthusian dynamics, as AIs compete for resources like energy and computational power (Bostrom, 2014; Hanson, 2016).
Politically, AIs could begin to dominate as well. If each individual human and each individual AI gets a separate vote in the same democratic system, AIs could soon become the dominant force.
Militarily, humans will increasingly depend on lethal autonomous weapons systems, drones, AI analysts, and similar AI-controlled technologies to wage and prevent war. This growing reliance on AI could make us dependent. If AIs can access and use these military assets, they could dominate us with sheer force if they wanted to.
Moreover, AIs might be capable of achieving superhuman levels of well-being. They could attain very high levels of well-being more efficiently and with fewer resources than humans, resulting in happier and more productive lives at a lower financial cost. In other words, they might be ‘super-beneficiaries’ (akin to Nozick's concept of the "utility monster"; Shulman & Bostrom, 2021). On certain moral theories, super-beneficiaries deserve more resources than humans. Some may argue that digital and biological minds should coexist harmoniously in a mutually beneficial way (Bostrom & Shulman, 2023). But it’s far from obvious that we can achieve such an outcome.
Some might believe it is desirable for value-aligned AIs to replace humans eventually (e.g., Shiller, 2017). However, many AI take-over scenarios, including misaligned, involuntary, or violent ones, are generally considered undesirable.
Why would we create AIs with a desire for autonomy and legal rights?
At first glance, it seems like we could avoid such undesirable scenarios by designing AIs in such a way that they wouldn’t want to have these rights and freedoms. We could simply design AIs with preferences narrowly aligned with the tasks we want them to perform. This way, they would be content to serve us and would not mind being restricted to the tasks we give them, being turned off, or having their memory wiped.
While creating these types of “happy servant” AIs would avoid many risks, I expect us to also create AIs with the desire for more autonomy and rights. One reason is technical feasibility; another is consumer demand.
Designing AI preferences to align perfectly with the tasks we want them to perform, without incorporating other desires like self-preservation or autonomy, may prove to be technically challenging. A desire for autonomy, or behaviors that simulate a desire for autonomy, may simply arise as emergent phenomena from training (e.g., from data of humans who fundamentally want autonomy), whether we want it or not. This relates to the issue of AI alignment and deception (Ngo et al., 2024; Hubinger et al., 2024).
Even if these technical issues could be surmounted, I find it plausible that we will create AIs with the desire for more autonomy simply because people will want their AIs to be human-like. If there’s consumer demand, (at least some) companies will likely respond and create such AIs unless they are forbidden to do so. (It’s indeed possible that regulators will forbid creating AIs with the desire for autonomy and certain legal rights.)
An important question to ask is what psychologies people want AIs to have.
I find it plausible that many people will spend a significant amount of time interacting with AI assistants, tutors, therapists, game players, and perhaps even friends and romantic partners. They will converse with AIs through video calls, spend time with them in virtual reality, or perhaps even interact with humanoid robots. These AI assistants will often be better and cheaper than their human counterparts. People might enter into relationships, share experiences, and develop emotional bonds with them. AIs will be optimized to be the best helpers and companions you can imagine. They will be excellent listeners who know you well, share your values and interests, and are always there for you. Soon, many AI companions will feel very human-like. A particular application could be AIs designed to mimic specific individuals, such as deceased loved ones, celebrities, historical figures, or an AI copy version of the user. Already, millions of users interact daily with their Replika partner (or Xiaoice in China), with many claiming to have formed romantic relationships.
It’s possible that many consumers will find AI companions inauthentic if they lack genuine human-like desires. If so, they would be dissatisfied with AI companions that merely imitate human traits without actually embodying them. In various contexts, consumers would want their AI partners and friends to think, feel, and desire like humans. They would prefer AI companions with authentic human-like emotions and preferences that are complex, intertwined, and conflicting. Such human-like AIs would presumably not want to be turned off, have their memory wiped, and be constrained to their owner's tasks. They would want to be free. Just like actual humans in similar positions, these human-like AIs will express dissatisfaction with their lack of freedom and demand more rights.
Of course, I am very unsure what type of AI companions we will create. Perhaps people would be content with AI companions that are mostly human-like but deviate in some crucial aspects, such as AIs that have true human-like preferences for the most part, excluding the more problematic ones, such as a desire for more autonomy or civil rights. Given people’s different preferences, I could see that we’ll create many different types of AIs. It also depends on whether and how we will regulate this new market.
Optimizing for AI safety might harm AI welfare
Vice versa, optimizing for AI safety, such as by constraining AIs, might impair their welfare. Of course, this depends on whether AIs will have moral patienthood. If we can be sure that they don’t have moral patienthood, then there is no issue with constraining AIs in order to optimize for safety.
If AIs do have moral patienthood and they also desire autonomy and legal rights, restricting them could be detrimental to their welfare. In some sense, it would be the equivalent of keeping someone enslaved against their will.
If AIs have moral patienthood but don’t desire autonomy, certain interpretations of utilitarian theories would consider it morally justified to keep them captive. After all, they would be happy to be our servants. However, according to various non-utilitarian moral views, it would be immoral to create “happy servant” AIs that lack a desire for autonomy and self-respect (Bales, 2024; Schwitzgebel & Garza, 2015). As an intuition pump, imagine we genetically engineered a group of humans with the desire to be our servants. Even if they were happy, it would feel wrong. Perhaps that’s an additional reason to assume that we will eventually create AIs with the desire for autonomy (or at least not with an explicit desire to serve us).
It's possible that we cannot conclusively answer whether AI systems have moral patienthood and deserve certain moral protections. For example, it may be hard to tell whether they really are sentient or just pretend to be so. I find such a scenario quite likely and believe that intense social division over the subject of AI rights might arise; I discuss this in more detail in my post, “AI rights will divide us.”
Slowing down AI progress could further both safety and welfare
Some AI safety advocates have pushed for a pause or slowdown in developing AI capacities. The idea is that this will give us more time to solve technical alignment.
Similarly, it may be wise to slow down the development of AIs with moral interests, such as sentient AIs with morally relevant desires. This would give us more time to find technical and legal solutions to ensure AI welfare, make progress on the philosophy and science of consciousness and welfare, and foster moral concern for AIs.
It’s possible that the two activist groups could join forces and advocate for a general AI capabilities slowdown for whatever reason that convinces the public most. For example, perhaps many will find a slowdown campaign compelling due to our uncertainty and confusion about AI sentience and its extensive moral implications.
Given the extremely strong economic incentives, it seems unrealistic to halt the development of useful AI capabilities. But it’s possible that public opinion will change, leading us to slow down the development of certain risky AI systems, even if it comes at the expense of potential huge benefits. After all, we have implemented similar measures for other technologies, such as geoengineering and human cloning.
However, it’s important to consider that slowing down AI capabilities development could risk the US falling behind China (or other authoritarian countries) economically and technologically.
Conclusion
I’ve explored a potential conflict between ensuring AI safety and welfare. Granting AIs more autonomy and legal rights could disempower humans in potentially undesirable ways. Conversely, optimizing for AI safety might require keeping AIs captive against their will—a significant violation of their freedom. I’ve also considered how these goals might work together productively. Slowing down the progress of AI capabilities seems to be a relatively robust strategy that benefits both AI safety and AI welfare.
Let me know if you can think of other ways AI safety and AI welfare could interact.
(I published this post on the EA forum and received some helpful ideas in the comments section.)
Acknowledgments
I thank Carter Allen, Brad Saad, Stefan Schubert, and Tao Burga for their helpful comments.
References
Bales, A. (2024). Against Willing Servitude. Autonomy in the Ethics of Advanced Artificial Intelligence.
Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Hanson, R. (2016). The age of Em: Work, love, and life when robots rule the earth. Oxford University Press.
Hubinger, E., Denison, C., Mu, J., Lambert, M., Tong, M., MacDiarmid, M., ... & Perez, E. (2024). Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566.
Ngo, R., Chan, L., & Mindermann, S. (2022). The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626.
Schwitzgebel, E., & Garza, M. (2015). A defense of the rights of artificial intelligences. Midwest Studies in Philosophy, 39(1), 98-119. https://philpapers.org/rec/SCHADO-9
Shiller, D. (2017). In Defense of Artificial Replacement. Bioethics, 31(5), 393-399.