New version of GPT-3 less toxic and better at understanding instructions, says OpenAI

A new version of GPT-3, OpenAI’s groundbreaking language model, removes some of the most toxic issues that plagued its predecessor. According to the San Francisco-based lab, the new model, called InstructGPT, is better at following instructions given by users – referred to as “alignment” in AI jargon – and thus produces less offensive language, less misinformation, and fewer errors overall – unless explicitly instructed otherwise.

As the OpenAI API is powered by GPT-3 language models were not aligned to the users, OpenAI said it had decided to use reinforcement learning from human feedback (RLHF) to make the models safer, more helpful, and more aligned with users.

We only use prompts submitted through the Playground to an earlier version of the InstructGPT models that was deployed in January 2021. Our human annotators remove personal identifiable information from all prompts before adding it to the training set.
– OpenAI

The resulting InstructGPT models have turned out to be better at following instructions than GPT-3, said the OpenAI team. They also make up facts less often, and show small decreases in toxic output generation.

These InstructGPT models, which have been in beta on the API for more than a year, are now the default language models accessible on our API. The InstructGPT models deployed in the API are updated versions trained using the same human feedback data, said OpenAI.

Leave a Reply Cancel reply