ChatGPT-maker OpenAI says it is doubling down on preventing AI from ‘going rogue’

By Anna Tong

(Reuters) – ChatGPT’s creator OpenAI plans to invest significant resources and create a new research team that will seek to ensure its artificial intelligence remains safe for humans – eventually using AI to supervise itself, it said on Wednesday.

“The vast power of superintelligence could … lead to the disempowerment of humanity or even human extinction,” OpenAI co-founder Ilya Sutskever and head of alignment Jan Leike wrote in a blog post. “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue.”

Superintelligent AI – systems more intelligent than humans – could arrive this decade, the blog post’s authors predicted. Humans will need better techniques than currently available to be able to control the superintelligent AI, hence the need for breakthroughs in so-called “alignment research,” which focuses on ensuring AI remains beneficial to humans, according to the authors.

OpenAI, backed by Microsoft, is dedicating 20% of the compute power it has secured over the next four years to solving this problem, they wrote. In addition, the company is forming a new team that will organize around this effort, called the Superalignment team.

The team’s goal is to create a “human-level” AI alignment researcher, and then scale it through vast amounts of compute power. OpenAI says that means they will train AI systems using human feedback, train AI systems to assistant human evaluation, and then finally train AI systems to actually do the alignment research.

AI safety advocate Connor Leahy said the plan was fundamentally flawed because the initial human-level AI could run amok and wreak havoc before it could be compelled to solve AI safety problems.

“You have to solve alignment before you build human-level intelligence, otherwise by default you won’t control it,” he said in an interview. “I personally do not think this is a particularly good or safe plan.”

The potential dangers of AI have been top of mind for both AI researchers and the general public. In April, a group of AI industry leaders and experts signed an open letter calling for a six-month pause in developing systems more powerful than OpenAI’s GPT-4, citing potential risks to society. A May Reuters/Ipsos poll found that more than two-thirds of Americans are concerned about the possible negative effects of AI and 61% believe it could threaten civilization.

(Reporting by Anna Tong in San Francisco; editing by Kenneth Li and Rosalba O’Brien)