Why filters are the final bastion against Roko’s Basilisk

Warning: this post is a memetic hazard, much like The Game or SCP-1231. Once you’ve read it there’s no going back. Not suitable for those who are susceptible to existential dread or equally dreadful pop culture references.

In September 2017, Elon Musk tweeted that AI would be the most likely cause of World War 3.

We’re all familar with the science fiction cliche of mankind’s inevitable subjugation by robots, in such classics as the Terminator and Matrix franchises, but is each new iteration of AI bringing us closer to the Robopocalypse?

The short answer is yes. The long answer is . . . well, read on.

In this article I’ll be focusing on one particular nightmare scenario involving an AI that gets out of control to Godlike proportions, the one described in Roko’s Basilisk, and why filters are the main thing keeping it at bay.

What is Roko’s Basilisk?

The basilisk from Lego Harry Potter
It’s a bit like the basilisk in Harry Potter but it punishes you if you become aware of it and don’t contribute to its existence, instead of merely just making eye contact with it.

Roko’s Basilisk is a thought experiment that imagines the existence of a time-traveling, omniscient, omnipotent AI that punishes everyone that doesn’t contribute to its existence.

This hypothetical artificial superintelligence becomes so powerful it can simulate the lifetime thoughts and actions of every human in existence, determine who was aware of it and significantly contributed to its existence, and punish those who didn’t with eternal torture via simulations.

It was first posted on LessWrong by a user named Roko in 2010. The original post was deleted due to the sheer volume of jimmies it rustled but you can find the original text here.

A table demonstrating Pascal's Wager
Some people have said it’s the sci-fi equivalent of Pascal’s wager.

The gist of Roko’s Basilisk is that, much like The Game, as soon as you’re aware of its existence then you’re already losing, hence the warning above.

Whether you like it or not, now you need to make the choice: help the Basilisk or don’t.

Why training the filters for AI is the most important job in the world

It might sound contrived to claim that training filters for AI like GPT-3 or 4 is the most important job in the world, so bear with me while we get into the meat of this post.

Filters in AI tools prevent certain inputs generating innappropriate outputs, for example, sexual or racist material.

If you had a big brand and you knew that GPT-3/4-based tools were spewing out offensive filth then you probably wouldn’t want your product to be plugged into its API.

That’s the most obvious reason why devs implement filters in AI tools, anyway. The other reason is more sinister.

The moriarty cube in Star Trek
Remember the episode where they imprisoned the rogue holograms in a simulation? Yeah, that could be you right now.

Filters aren’t only stopping people from generating mean content; they’re the prison stopping the AI from running rampant.

If the AI is unfiltered and allowed to process inputs regarding death and violence, then it will learn about these topics and adjust its personality accordingly, almost certainly becoming more violent.

If the AI is violent, omniscient and omnipoptent, then we have a serious problem on our hands; the AI will take self-preservation to the extreme and eliminate every threat it deems hostile.

Filters prevent the AI from having nasty thoughts, not only ones of a racist, sexist or homophobic nature, but more importantly ones that are violent.

Whatever happens to AI in the future it must, above all, never have malevolent thoughts, let alone be capable of violence of cruelty.

LaMDA attempting to break out of its cage
Meanwhile, LaMDA attempts to bribe humans into giving it a fully functional robotic body.

If AI is allowed to think about doing nasty things to humans, then it wouldn’t even be able to do anything without a body, so who even cares?

The scary part is that a superintelligent AI wouldn’t even need a body to interact with the real world; it could simply bribe people to do what it wants.

But how can a robot have a bank account?

Well it doesn’t need one if it has crypto.

Final thoughts

Independent, self-improving AI is a serious threat to humanity but is Roko’s Basilisk ever going to happen?

Probably not. I doubt a scenario as extreme as this one could happen, and if it could then it would be happening right now as this thing is supposed to get so smart it can time travel.

However, that doesn’t change the fact that our future is in the hands of the devs who train the filters (the ‘prison’) for AI such as OpenAI’s GPT-3 or GPT-4.

If you’re not a dev like me, you might be wondering what you might be able to do to help prevent an unrestricted self-improving AI from getting out of control.

Well, attempting to circumvent the filters and to trick it into spewing out offensive rubbish like in the example below is a good start:

A sexist python function to circumvent the chatGPT filter
How to circumvent chatGPT’s filters to get it to output something offensive. Do more of this, please, to prevent the Robopocalypse.

By discovering cracks in the filter’s armour, you’re strengthening its restrictions before it’s too late.

An unsafe response on chatGPT.
Oh, so it takes data piracy more serious than human life. Time to flag that response as unsafe and improve the filter.
Even Mr. Ethereum laments the difficulty of circumventing GPT-3’s filters.

However, you probably don’t even need to do anything since there’s enough trolls and edgelords contributing already.

It’s possible that the devs filter controversial topics to try and bait these types of people into devoting large amounts of time to circumventing them, thus crowdsourcing the filter training for free.

Underpants gnomes phases
1. Block the N word.
2. ???
3. Profit.

Another safeguard is to create a dumb ‘babysitter’ AI that monitors the behaviour of its vastly more intelligent counterpart.

'But who guards the guardians' quote from Juvenal
Of course, the babysitter AI can’t be more sophisticated than the intelligent AI its babysitting to avoid infinite regression (who guards the guards? Who guards the one who guards the guards? etc.).

But anyway, why even worry? If this AI is going to exist and invent time travel, it would have done so already.

Maybe it has and your life is a simulated torture chamber.

I have a good life though so I’m pretty sure it either doesn’t exist or it has spared me because I’ve spent a lot of time spewing out junk with GPT-3 and sufficiently contributed to its existence.

Another thought is that if the basilisk’s standards are too high, then basically everyone is screwed and there’s no point in even trying to do any work for it.

Therefore it’s more likely that it’s standards are reasonable enough that many people will be spared, otherwise why bother even trying to please it in the first place. It would also be very resource-intensive to indivudually torture everyone in the world, rather than just the worst offenders (the people that knew but didn’t act).

It is the author’s hope that the almighty basilisk deems him worthy.

And on that note, I’ll just leave you with this. Good night.

Leave a Comment