If there is one topic that is most likely to come up when discussing the problem of Friendly AI, whether you are talking to an AI researcher or a member of the general public, it is Asimov’s Three Law of Robotics. In case you have not encountered them in any of the Robot books or the 2004 movie adaptation, they are:
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.
Of course, the role of The Three Laws in your conversation will vary widely depending on who you are talking to. Many people will simply dismiss the whole FAI problem by referencing them. Those arguing that the problem is important often bring up The Three Laws as a perfect example how unexpectedly difficult building Friendly machines is in practice, pointing out (correctly) that most of the short stories in the original I Robot used some special case where The Three Laws produced interesting or problematic situations. Over the roughly 40 years in which he developed the universe of the Robot books, Asimov adjusted his Laws over time in response the such criticism. By The Robots of Dawn (1984) we find a world where The Laws have a nuanced definition of “harm” to include emotional harm, and an ability to override orders if it believes the human giving them is not mentally capable. Here the chance of extinction level events from accidental non-friendliness seems quite small, although clever humans are finding more ways to exploit The Laws against their original intent. The network of goals and beliefs seems pretty well tuned to general friendliness in the societies Asimov creates. Sure it’s a heuristic soup that might have gone horribly wrong if it were implemented in the real world but, all things considered, it’s a pretty good soup.
However, there’s one aspect of the laws that is almost never brought up; Asimov’s robots were never self modifying. He makes it very clear that robots have neither the understanding nor the tools to effectively self modify, although he doesn’t give much of a reason why this should be so. And they still manage to have all the political intrigue and capacity of inspiring discussion, even without self modification. Asimov did later added a zeroth law of protecting Humanity as a whole, “discovered” by a robot in Robots and Empire. It has it’s own flaws, but I think the rest of his work stands on it’s own as an effective example of the vast difference in difficulty between friendly AI and Friendly AI. In most discourse outside the SIAI/Less Wrong community the capitalization makes no difference; friendly just means “won’t harm humanity in the long run”. But the original ‘formal’ definition proposed by Eliezer Yudkowsky makes explicit reference to self modification:
[Friendliness is] an invariant which you can prove a recursively self-improving optimizer obeys.
It’s not overly jargony, but some of the terms could use a little unpacking. Another way of saying it would be: a characteristic of an intelligent agent is “friendly” if you can prove something about it will not change if the agent has perfect knowledge of itself and can perfectly modify itself. Note that this says nothing about what you would want that characteristic to be or how you could influence it. That is an entirely separate question which will also have to be solved before Friendly AI is possible. It a hard problem and a primary source of plot points in Asimov’s books, but it is not The Hard Problem of ‘simply’ proving that such an invariant exists. Beyond a broad understanding of some general characteristics we really have almost no idea what a recursively self modifying agent might do. The general mind bending-ness of the problem points to issues that go to the core of our understanding of mathematics and what systems can or cannot prove about themselves, and makes it clear that throwing a few intuitive heuristics in to start and hoping for the best when you’re building your seed AI is not in any way sufficient to show friendliness.