Artificial general intelligence is risky by default

Should we worry about AI?

Connor Leahy is an AI researcher at EleutherAI, a grass-roots collection of open-source AI researchers. Their current ambitious project is GPT-Neo, where they’re replicating currently closed-access GPT-3 to make it available to everyone.

Connor is deeply interested in the dangers posed by AI systems that don’t share human values and goals. I talked to Connor about AI misalignment and why it poses a potential existential risk for humanity.

What we talk about

00:05 – Introductions
2:55 – AI risk is obvious once you understand it
3:40 – AI risk as a principal-agent problem
4:33 – Intelligence is a double-edged sword
7:52 – How would you define the alignment problem of AI?
9:10 – Orthogonality of intelligence and values
10:15 – Human values are complex
11:15 – AI alignment problem
11:30 – Alignment problem: how do you control a strong system using a weak system
12:42 – Corporations are proto-AGI
14:32 – Collateral benefits of AI safety research
16:25 – Why is solving this problem urgent?
21:32 – We’re exponentially increasing AI model capacity
23:55 – Superintelligent AI as the LEAST surprising outcome
25:20 – Who will fund to build a superintelligence
26:28 – Goodhart’s law
29:19 – Definition of intelligence
33:00 – Unsolvable problems and superintelligence
34:35 – Upper limit of damage caused by superintelligence
38:25 – What if superintelligence has already arrived
41:40 – Why can’t we power off superintelligence if it gets out of hand
45:25 – Industry and academia is doing a terrible job at AI safety
51:25 – Should govt be regulating AI research?
55:55 – Should we shut down or slow AI research?
57:10 – Solutions for AGI safety
1:05:10 – The best case scenario
1:06:55 – Practical implementations of AI safety
1:12:00 – We can’t agree with each other on values, how will AGI agree with us?
1:14:00 – What is EleutherAI?

Notes and insights

1/ Researchers who think about existential risks to humanity believe that artificial general intelligence (AGI) is the most potent threat facing us and the damage done by a superintelligence can dwarf anything else we’ve seen or imagined so far.

This is a view shared by Nick Bostrom and Stuart Russel among others.

2/ Prima facie, this sounds odd. Why would AGI – a mere software – pose a bigger threat than other deadly technologies such as nuclear weapons?

3/ To answer that question, you have to first define what artificial general intelligence is?

Perhaps a good starting point to that question is defining intelligence.

4/ Different people define intelligence differently and you can do endless debates on which definition best captures our intuitions.

However, I love the way Connor defined intelligence. To him, intelligence is simply the ability to win.

5/ More generally, you can think of intelligence as a process to choose your actions such that you make your preferred future more likely than not.

6/ Say you love ice-cream, but all you have is lemonade. Intelligence, then, is your ability to persuade someone with ice-cream to give you some in exchange for your lemonade.

7/ Most types of biological intelligences are specific. If a human is locked in a dark cave and wants to get out, it’ll struggle and hence its intelligence isn’t particularly useful in that context. However, a bat, thanks to echolocation, will have no issue in finding its way out.

8/ That was a contrived example and I know that humans exhibit more types of intelligence in various contexts.

But there’s no reason to believe humans exhibit maximally general intelligence. We fail at difficult projects all the time, so there’s a limit to our intelligence.

9/ Imagine that we are able to make an AI that’s 10x, 100x, or 1000x more capable than a human to figure out how to win.

You give it a task and it figures out of achieving that task in ways you could have never thought of by yourself.

We don’t yet know how to build such a superintelligent system, but it’s possible in near future we’d be able to build one.

10/ And that’s where the risk is.

A sufficiently intelligent entity may even wipe out humanity if it determines that humanity stands in its way to achieve the given goal.

Sounds outlandish?

It isn’t.

11/ The reason it isn’t outlandish is because dumb machines cause harm to humans all the time. Think of bridge collapses, train wrecks, airplane malfunctions or accidents like the Chernobyl disaster.

Machines cause harm because they’re nothing but blind order following systems. A train doesn’t even realize what it is doing when it derails. It’s simply following the mechanism we humans built into it.

12/ Machines are dumb and that’s why they’re dangerous.

But AGI is riskier because it’s super-intelligent and intelligence doesn’t automatically imply alignment to human values. A self-guided intercontinental ballistic missile that finds its target is intelligent but it’ll blindly kill thousands.

13/ In fact, the idea that AGI can be supremely dangerous can be understood by reflecting that humans who don’t share your moral values can be supremely dangerous as well.

Think of terrorists who blew up the Twin Towers on 9/11. Their intelligence was what made them more dangerous than dumb terrorists who easily get caught.

14/ An AGI, by definition, will be so intelligent that it can wreck unimaginable harm to achieve what it is programmed to achieve.

It will follow orders blindly but follow them so cleverly that it becomes literally unstoppable.

15/ Perhaps the most famous example of cleverness gone bad is the paperclip maximizer. This is an AI tasked by a factory owner to find ways to maximize the number of paperclips. It figures out that the best way to do that is to convert the entire Iron core of Earth into paperclips.

(Of course, humanity decimates when the Earth crust blows off).

15/ This may still sound alarmist and several first-order objections come to the mind.

Let’s tackle them one by one.

16/ WHY CAN’T WE ASK AGI TO AVOID HARM?

What is harm?

Is eating ice-cream harmful? No. Is eating too much ice-cream harmful? Yes.

Is killing humans bad? Yes. Is killing a suicide bomber who’s about to blow himself bad? No.

17/ We can go on and on. The point is: defining good and bad, harm and benefit is difficult because these are nebulous concepts.

If we learn anything from philosophical debates about ethics, it is that even we collectively can’t agree on what our values should be.

And if we don’t know our values, how should we code them in the AI?

17/ WHY CAN’T WE ANTICIPATE WHAT AGI IS GOING TO DO AND PREVENT IT FROM DOING THAT?

No, we can’t do that because by definition it is super-intelligent. If we knew what it is going to do, we’d be as intelligent as it.

Just like with AlphaGo Zero (AI Chess or Go engine by Deepmind), what we know for sure is that it is going to win, but we don’t know what’s the next move it is going to make.

17/ WHY CAN’T WE SWITCH IT OFF WHEN THINGS GO BAD?

If we can think of this, an AGI would likely think of this as well. There are several ways it can prevent from getting switched off:

  • It can distribute its copies across the Internet.
  • It can bribe / incentivize people to not switch it off
  • It can pretend to have learned its lesson and promise not do harm in future

Remember: there are dumb things – like bitcoin – right now which are so distributed that they’re literally impossible to kill (even if we wanted to).

18/ WHAT IF THERE ARE LIMITS TO INTELLIGENCE AND HENCE LIMITS TO HARM AGI CAN CAUSE?

This is a sensible objection. There are problems that are practically unsolvable because they require more computation power than possibly available in the universe.

Examples of such problems include breaking encryption or predicting long-range weather or stock market trends.

19/ It’s true that for such unbreakable problems, AGI will be powerless but the important point to remember is that an AGI doesn’t have to be infinitely more intelligent to cause a lot of harm.

It simply has to be more intelligent than humans or collectives of humans.

20/ That amount of intelligence (which surpasses humans) is certainly within the realm of possibility.

Human brains are limited by our biological constraints. No such constraints may apply to AGI. In theory, we may be capable of building an AGI with more computation and inter-connectedness than a human brain.

21/ So, in nutshell, even if an AGI can’t break encryption to launch a nuclear weapon, it can very well convince a human with encryption keys to launch the weapon.

When it comes to security, humans are generally the weakest link.

22/ IF IT IS SO DANGEROUS, SHOULD WE STOP OR BAN AI RESEARCH?

Banning AI research is like banning computers. It won’t work. You can’t possibly monitor all software progress to figure out which one is AGI and which one, say, is simply a new web browser.

Even if you have some success with this, since AGI promises power and riches, different nations will have incentives to continue researching and making progress.

23/ IF THINGS GOES BAD, WOULDN’T GOVERNMENT DO SOMETHING?

If history teaches us something, the answer to this question is negative. Governments have a terrible track record when it comes to tackling the consequences of technology.

Governments have failed so far to make any substantial nudge towards preventing climate catastrophe. What gives us confidence that it can prevent humanity from the AGI risk?

24/ AGI WILL TAKE TIME SO WE WILL HAVE PLENTY OF TIME TO WORRY.

Two trends indicate that we won’t have plenty of time:

  • Computing power has been increasing exponentially
  • AI models we have currently seem to scale their intelligence exponentially with available data and model size

25/ If you know anything about exponential curves (hello covid!), you’d be worried. GPT-3 was not just better than GPT-2. It was exponentially better.

Similarly, newer AI models will be exponentially better than current ones.

26/ Connor believes that the popular Hollywood portrayal of AI as killer robots gives exactly the wrong impression.

There will not be a gradual AI uprising. Like we’re amused by GPT-3, we will continue to be amused by more and more sudden improvements and in that process, we won’t even realize when we hit the point of no return.

27/ SO WHAT SHOULD WE DO TO MINIMIZE AGI RISK?

Unfortunately, there are no clear answers. This is a topic of intense research and all we can hope to do right now is either research ourselves or support researchers such as Connor or Eliezer Yudkowsky or Geoffrey Irving.

28/ We have several candidate approaches for aligning AI with human values, but the truth is that we don’t yet have a concrete approach.

29/ WHY THIS TOPIC IS URGENT?

AGI risk may sound like an academic and arcane topic. But Connor calls it philosophy on a deadline.

Thousands of startups and big companies like OpenAI, Google or Facebook are rushing to improve their AI systems. The research in AI is progressing at a dizzying pace.

If everyone is rushing to be the first one to pass the Turing test, working on AI safety becomes a race against time.

30/ Let’s collectively hope is that AI safety researchers win before AGI creators do.

Have any comments? Email me.