An example of conflicting goals: a person wants to be healthy. The same person also really likes eating chocolate. A person with access to his own hardware could resolve the conflict either by modifying himself to make it less fun to eat chocolate, or by to not care about the negatives of being unhealthy. It seems obvious that the first option is the better one for long-term survival, but in the second case, after you modify yourself, you won't care either. And even this second resolution is far less dangerous than outright short-circuiting one's reward center, getting a shot of dopamine for doing nothing. And this short-circuit option would be on the table for a fully self-modifying agent. And, for any self-modifying goal-seeking agent, this will very quickly be realized.
Fortunately or otherwise, this hasn't been a problem for life on Earth yet, because the only way living things here can get rewards is through behavior - because we cannot modify ourselves. The things that cause pleasure and pain are set in stone (or rather, in neurons) and only through behavior (modifying the external environment as opposed to yourself) are rewards obtained. But there are hints in higher vertebrates of small short-circuits - nervous system hacks they have stumbled across which tweak their reward circuits directly. Elephants remember the location of, and seek out fermented fruit (to get happily buzzed). Elephant seals dive rapidly to unnecessary depths to cause narcosis (we think). Primates (including us) masturbate incessantly. And humans specifically have found things like heroin. As we humans learn still more about ourselves and learn how to manipulate the neural substrate, this may be changing. Consequently, if ever humans are able to alter our nervous systems directly and completely, ruin may follow quickly. And indeed, this has been shown with rats: give them the ability to directly stimulate their reward centers with electrical current, and they will do so to the exclusion of all other activities, including those required for survival - hedonic recursion.
In a great discussion at the Machine Intelligence Research Institute website, Luke Muehlhauser talks to Laurent Orseau about how to solve the problem of what kinds of self-modifying agents avoid this problem. The discussion is about how to build an artificial intelligence, but it applies to biological nervous systems that, like us, are increasingly able to self-modify.
One of the theoretical agents Orseau conceived was a knowledge-driven, as opposed to a reinforcement-driven agent, a goal-seeking agent, or a prediction-confirming agent:
...knowledge-seeking has a fundamental distinctive property: On the contrary to rewards, knowledge cannot be faked by manipulating the environment. The agent cannot itself introduce new knowledge in the environment because, well, it already knows what it would introduce, so it's not new knowledge. Rewards, on the contrary, can easily be faked.
I'm not 100% sure, but it seems to me that knowledge seeking may be the only non-trivial utility function that has this non-falsifiability property. In Reinforcement Learning, there is an omnipresent problem called the exploration/exploitation dilemma: The agent must both exploit its knowledge of the environment to gather rewards, and explore its environment to learn if there are better rewards than the ones it already knows about. This implies in general that the agent cannot collect as many rewards as it would like.
But for knowledge seeking, the goal of the agent is to explore, i.e., exploration is exploitation. Therefore the above dilemma collapses to doing only exploration, which is the only meaningful unified solution to this dilemma (the exploitation-only solution leads either to very low rewards or is possible only when the agent already has knowledge of its environment, as in dynamic programming). In more philosophical words, this unifies epistemic rationality and instrumental rationality.
There's a lot more to the argument (you really should read it), but there are several points to be made with respect to this paper.
1) These are not fully self-modifying agents. In this environment their central utility function (reward, knowledge, etc.) remains intact. The solution is to collapse exploitation (reward) into exploration (outward orientation). The knowledge agent can only get buzzed off of novel data, so it has to keep learning. But exploitation and exploration are two conceptually separable entities; so if modification of the central utility function is allowed, eventually the knowledge agents will split exploration and exploitation again, and we're back to reward agents. (At the very least, given arbitrary time, the knowledge agents would create reward agents, to get more data, even if they didn't modify themselves into reward agents.)
2) Orseau's point is taken that if novel data is what's rewarding them, as long as that utility function is intact, they cannot "masturbate" - they have to get stimulation from outside themselves. In another parallel to the real neurology of living things, he states "all agents other than the knowledge agent are not inherently interested in the environment, but only in some inner value." The core of utility is pleasure and pain, which are as much an inner value as it is possible to be. Light is external data, but if you shine a bright light in someone's eyes and it hurts, the pain is not in the light, it's in the experience the light creates through their nervous system. Utility is always an inner value. The trick of the knowledge-based agents is in pinning that inner value to something that cannot arise from inside the system.
3) The knowledge-based agent is maximizing experienced Kolmogorov complexity. That is to say, it wants unexpected information. Interestingly, Orseau says this type of agent is the best candidate for an AI, but such an agent could never evolve by natural selection. He points out that the agents he's using are immortal and don't suffer consequences to their continued operation by any of their experiences. But an agent that can be "damaged" and that is constantly seeking out unexpected environments (ones it doesn't fully understand) would quickly be destroyed. In contrast, Orseau commented that the reinforcement-based agent ends up strongly defending the integrity of its own code. Evolutionarily, any entity that does not defend its own integrity is an entity you won't see very many of (unless the entity is very simple, and/or the substrate is very forgiving of changes. This is why you see a new continuum of viral quasispecies appear after a single year, but why animal species reproductively isolate and you shouldn't hold your breath for, say, hippos to be that much different any time soon.)
4) No doubt real organisms are imperfect amalgamations of all of these agent strategies and more. To that end, Orseau found that the reinforcement (reward)-based agent acts the most like a "survival machine". In his system, I would wager that living things on Earth are reinforcement-based agents with a few goals sprinkled in. (There are many animals, including humans, that startle when they see something snake-like. fMRI studies have even suggested that there are actually specific brain regions in humans corresponding to certain animals - it's really that klugey.) However, of further interest here is that even between humans there are substantial differences in how much utility is to be gained from unexpected novelty, some of them known to be genetically influenced. Some of us are born to be surprise-seeking knowledge agents more than others. The meaning of having multiple genes not at fixation would be useful to investigate. (Only recently valuable in evolutionary time, now that our brains have enough capacity?)
If your goal is to create agents that act to preserve and make more of themselves and remain in contact with the external environment rather than suffering a hedonic recursion implosion, there are a few stop-gaps you might want to put in place.
1. Make self-modification impossible. This is the de facto reality for life on Earth, including us, except for a few hacks like heroin. Life on Earth has at least partly done this, converting early on from RNA to the relatively inert DNA as its code.
2. Build in as strong a future orientation as possible, with the goal being pleasure maximization rather than pain minimization. That way pleasure now (becoming a wirehead) in exchange for no experience of any kind later (pain or pleasure, meaning death) becomes abhorrent. You might complain about the lack of future orientation in humans* but the fact that any organism has any future orientation is testament to its importance.
It could be that we haven't seen alien intelligences because they all become wireheads, and we haven't seen alien singularities expanding toward us because Orseau's E.T. counterparts built their AIs to seek novelty, and the AIs destroy themselves in that way.
Speaking of poor future orientation where reward is concerned: I have seen a man literally dying of heart failure, in part from not complying with his low-sodium diet, eating a cheeseburger and salty, salty fries that he brought with him into the ER.