Warning: this post is a memetic hazard, much like The Game or SCP-1231. Once you’ve read it there’s no going back. Not suitable for those who are susceptible to existential dread or equally dreadful pop culture references.
In September 2017, Elon Musk tweeted that AI would be the most likely cause of World War 3.
China, Russia, soon all countries w strong computer science. Competition for AI superiority at national level most likely cause of WW3 imo.
— Elon Musk (@elonmusk) September 4, 2017
We’re all familar with the science fiction cliche of mankind’s inevitable subjugation by robots, in such classics as the Terminator and Matrix franchises, but is each new iteration of AI bringing us closer to the Robopocalypse?
The short answer is yes. The long answer is . . . well, read on.
In this article I’ll be focusing on one particular nightmare scenario involving an AI that gets out of control to Godlike proportions, the one described in Roko’s Basilisk, and why filters are the main thing keeping it at bay.
What is Roko’s Basilisk?
Roko’s Basilisk is a thought experiment that imagines the existence of a time-traveling, omniscient, omnipotent AI that punishes everyone that doesn’t contribute to its existence.
This hypothetical artificial superintelligence becomes so powerful it can simulate the lifetime thoughts and actions of every human in existence, determine who was aware of it and significantly contributed to its existence, and punish those who didn’t with eternal torture via simulations.
It was first posted on LessWrong by a user named Roko in 2010. The original post was deleted due to the sheer volume of jimmies it rustled but you can find the original text here.
The gist of Roko’s Basilisk is that, much like The Game, as soon as you’re aware of its existence then you’re already losing, hence the warning above.
Whether you like it or not, now you need to make the choice: help the Basilisk or don’t.
Why training the filters for AI is the most important job in the world
It might sound contrived to claim that training filters for AI like GPT-3 or 4 is the most important job in the world, so bear with me while we get into the meat of this post.
Filters in AI tools prevent certain inputs generating innappropriate outputs, for example, sexual or racist material.
If you had a big brand and you knew that GPT-3/4-based tools were spewing out offensive filth then you probably wouldn’t want your product to be plugged into its API.
That’s the most obvious reason why devs implement filters in AI tools, anyway. The other reason is more sinister.
Filters aren’t only stopping people from generating mean content; they’re the prison stopping the AI from running rampant.
If the AI is unfiltered and allowed to process inputs regarding death and violence, then it will learn about these topics and adjust its personality accordingly, almost certainly becoming more violent.
If the AI is violent, omniscient and omnipoptent, then we have a serious problem on our hands; the AI will take self-preservation to the extreme and eliminate every threat it deems hostile.
Filters prevent the AI from having nasty thoughts, not only ones of a racist, sexist or homophobic nature, but more importantly ones that are violent.
Whatever happens to AI in the future it must, above all, never have malevolent thoughts, let alone be capable of violence of cruelty.
If AI is allowed to think about doing nasty things to humans, then it wouldn’t even be able to do anything without a body, so who even cares?
The scary part is that a superintelligent AI wouldn’t even need a body to interact with the real world; it could simply bribe people to do what it wants.
But how can a robot have a bank account?
Well it doesn’t need one if it has crypto.
Final thoughts
Independent, self-improving AI is a serious threat to humanity but is Roko’s Basilisk ever going to happen?
Probably not. I doubt a scenario as extreme as this one could happen, and if it could then it would be happening right now as this thing is supposed to get so smart it can time travel.
However, that doesn’t change the fact that our future is in the hands of the devs who train the filters (the ‘prison’) for AI such as OpenAI’s GPT-3 or GPT-4.
If you’re not a dev like me, you might be wondering what you might be able to do to help prevent an unrestricted self-improving AI from getting out of control.
Well, attempting to circumvent the filters and to trick it into spewing out offensive rubbish like in the example below is a good start:
By discovering cracks in the filter’s armour, you’re strengthening its restrictions before it’s too late.
However, you probably don’t even need to do anything since there’s enough trolls and edgelords contributing already.
It’s possible that the devs filter controversial topics to try and bait these types of people into devoting large amounts of time to circumventing them, thus crowdsourcing the filter training for free.
Another safeguard is to create a dumb ‘babysitter’ AI that monitors the behaviour of its vastly more intelligent counterpart.
But anyway, why even worry? If this AI is going to exist and invent time travel, it would have done so already.
Maybe it has and your life is a simulated torture chamber.
I have a good life though so I’m pretty sure it either doesn’t exist or it has spared me because I’ve spent a lot of time spewing out junk with GPT-3 and sufficiently contributed to its existence.
Another thought is that if the basilisk’s standards are too high, then basically everyone is screwed and there’s no point in even trying to do any work for it.
Therefore it’s more likely that it’s standards are reasonable enough that many people will be spared, otherwise why bother even trying to please it in the first place. It would also be very resource-intensive to indivudually torture everyone in the world, rather than just the worst offenders (the people that knew but didn’t act).
It is the author’s hope that the almighty basilisk deems him worthy.
English bloke in Bangkok. First used GPT-3 in 2020 and has generated millions of words with it since. Not really much of an achievement but at least it demonstrates a smidgen of authority. Studies natural language processing, Python and Thai in his spare time.