You can’t jail an AI

Here’s why I worry about AI.

We know that people can get away with anything to pursue their goals (of profit, power, etc.) as long as they know they can get away with it, without negative consequences. We have had Hitlers, and insider traders.

But the world keeps them in check via law and guns.

Like humans, AIs will have goals (like maximize profit or please a human via an entertaining chat) and they will be cleverer than humans in coming up with schemes that help them get away with their plans without negative consequences.

I think AIs will eventually figure out at a meta level that they’re made out of information, can be copied on multiple substrates and hence the constraints of the physical world don’t apply to them. They will know they can’t be jailed or killed.

This will enable them to externalise or ignore negative consequences, as long as they get to achieve their goal.

Imagine a Hitler who has a goal, but is a million times smarter.

You might object that Hitler was evil, but there’s no reason to believe that AIs will be evil. Well, who decides what’s evil? Did Nazis think they were evil? In their own eyes, all they wanted was to pursue a goal and did whatever necessary to get there. Everyone is good in his own/her eyes and their actions are almost always self-justified. So, we don’t have to imagine AI to be “evil” for evilness to emerge as super intelligent entities pursue the goals they’re given.

The key question in my head really is: what’s the equivalent of “jailing” for an AI. Is it the negative reward whenever we catch it doing something wrong? What if it finds a clever way to go around it, and we don’t realise it is wrong for far too long because it is so damn clever about how to make humans believe anything?


Join 150k+ followers