OpenAI team dialogue record: ChatGPT is cool, but there are many problems exposed

When OpenAI quietly launches ChatGPT in late November 2022, the San Francisco-based artificial intelligence company has few expectations. Admittedly, no one inside OpenAI could have predicted that this would lead to a massive viral frenzy.

Since then, the company has been catching up — and trying to cash in.

As a result, MIT Technology Review found the team behind ChatGPT and conducted an in-depth interview.

According to Sandhini Agarwal of OpenAI’s policy department , ChatGPT was initially conceived as a “research preview”: a preview of a more mature version of the technology from two years ago, and, more importantly, an attempt to correct some of it through public feedback. defect.

“We don’t want to overhype it and say it’s a major fundamental advance,” said Liam Fedus, an OpenAI researcher who worked on ChatGPT.

To gain insight into this chatbot – how it was developed, how OpenAI has continued to update it since its release, and how its developers view its success.

I interviewed four people who helped build what is already one of the most popular Internet applications.

In addition to Agarwal and Fedus, I interviewed OpenAI co-founder John Schulman and OpenAI calibration team lead Jan Leike .

The Calibration team works on solving the problem of how AI can achieve user-desired behavior (and nothing more).

My sense is that OpenAI is still confused by its success, but has seized the opportunity to advance the technology, watch how millions of users use it, and try to fix the most urgent problems that arise.

OpenAI has made several updates to ChatGPT since November. Researchers are using adversarial training techniques to prevent ChatGPT from being induced by users to behave badly (also known as jailbreaking).

This work pits multiple chatbots against each other: one chatbot plays the role of an adversary, attacking another chatbot by generating text that forces it to violate usual constraints and generate unwanted responses. Successful attacks are added to ChatGPT’s training data. Hopefully it learns to ignore these attacks.

OpenAI has also inked a multibillion-dollar deal with Microsoft and an alliance with Bain, which plans to use OpenAI’s generative AI models.

Outside of OpenAI, the buzz around ChatGPT has sparked another wave of hype around large-scale language models, with companies and investors around the world jumping into the frenzy.

There’s been so much hype in just three months. What is the source of ChatGPT? What steps is OpenAI taking to ensure it is ready for the public? What will they do next?

Selected content

Jan Leike : Frankly, it was overwhelming. We were very surprised and have been trying to catch up.

John Schulman : I was checking Twitter in the days after the release, and my feed was full of screenshots of ChatGPT during that time.

I expected it to be intuitive to people and have a certain following, but I didn’t expect it to be this popular.

Sandhini Agarwal : It was a surprise to all of us to see people starting to use it so widely. We spend so much time on these models that we often forget how amazing they are to the outside world.

Liam Fedus : We didn’t expect this product to be so popular. After all, so many people have tried to develop a general-purpose chatbot before, and I know the chances of success are slim. However, our private testing has convinced us that we have something that people will actually love.

Jan Leike : I’d love to understand better what’s behind this – what’s driving all this viral behavior. Seriously, we don’t quite get it.

Part of the team’s confusion stems from the fact that much of ChatGPT’s technology isn’t new. ChatGPT is a “polished version” of GPT-3.5, a family of large-scale language models released by OpenAI a few months ago. And GPT-3.5 itself is an updated version of GPT-3, which appeared in 2020. The company makes the models’ application programming interfaces (APIs) available on its website, making it easy for other software developers to integrate the models into their own code. OpenAI has also released the GPT-3.5 “Advance Preview”, which will be released on InstructGPT in January 2022. But none of these previous versions of the technology rolled out to the public like ChatGPT did.

Liam Fedus : The ChatGPT model was fine-tuned from the same language model as InstructGPT, which we fine-tuned using a similar approach. We added some dialogue data and tweaked the training process slightly. So we don’t want to get too hyped about it, claiming it’s a major fundamental advance. But it turns out that conversation data has a huge positive impact on ChatGPT.

John Schulman : From the standard benchmark evaluation, the underlying technical strength between these models is actually not much different, but ChatGPT is easier to access and use.

Jan Leike : In a sense, you can understand ChatGPT as one of the versions of our AI system that has been released for some time.

Under the hood, it’s not much better than the previous model. The same underlying model provided an API almost a year before ChatGPT was released.

On the other hand, we made it more in line with what people want to do. It communicates with you in a conversation, the chat interface is easy to use, and it tries to be a useful tool. This is amazing progress, and I think this is where people are realizing.

John Schulman : It’s easier to infer intent, and users can communicate repeatedly to achieve what they want.

ChatGPT is trained in a very similar way to InstructGPT, using a technique called Reinforced Learning with Human Feedback (RLHF). This is the killer feature of ChatGPT. The basic idea is to take a large-scale language model that tends to spit out anything at will — in this case GPT-3.5 — and teach it to respond by learning the preferences of human users, allowing for fine-tuning.

Jan Leike : We have a large team that reads ChatGPT prompts and responses to see if one response is better than the other.

All this data is then combined into one training step. Most of these are things we do in InstructGPT.

You want it to actually work, you want it to tell the truth, you want it to be harmless.

Then it also has some traits dedicated to generating dialogue and as assistants.

For example, if the user’s query is not clear enough, it should follow up with a question. It should also reveal its identity as an AI system, and should not assume that it does not have an identity, let alone show that it has abilities that it does not have.

When the user asks it to perform a task it shouldn’t, it must explicitly say no.

One sentence that came up in this training was “As a language model trained by OpenAI…” This reminder is not a hard and fast rule, but it became a point that human reviewers rated it highly.

Sandhini Agarwal : Exactly. Human reviewers must rate the models based on a range of criteria, such as authenticity. But they start leaning toward what they think is the right thing to do, like not pretending to know.

Since ChatGPT uses technology used by OpenAI, the team did not make special preparations when releasing this model to the public. They thought they had set the bar high enough for previous models.

Sandhini Agarwal : We did not consider this model to be a new threat when preparing for release. GPT-3.5 has already existed in the world, and we know it is safe enough. Moreover, ChatGPT has learned to reject by itself through training on human preferences, rejecting many requests.

Jan Leike : For ChatGPT, we did do some additional “red team testing” (Translator’s Note: A full range of attack simulations to find system vulnerabilities), and everyone at OpenAI sat down and tried to “break” the model . We have foreign players doing the same. We did an Early-Access test with old users who gave us feedback.

Sandhini Agarwa l: We did find that it produced some (people) unwanted output, but GPT-3.5 produces these things as well. In terms of risk, it’s a research preview, that’s why it was [released] in the first place, so it’s not really a big deal.

John Schulman : You can’t wait until the system is perfect before releasing it. We’ve been testing an early version for several months, and participants have been very impressed with the product.

Our biggest concern is its accuracy, as this model loves to falsify facts. But InstructGPT and other large-scale language models are already available, so we think that as long as ChatGPT is better than the former in terms of accuracy and other security issues, it should be fine to roll out.

Before release, we were convinced that these models appeared to be better than other models in terms of accuracy and safety, and based on our limited evaluation, we made the decision to release.

Since its release, OpenAI has been watching how people use it, seeing for the first time how well a large language model performs when it is put in the hands of tens of millions of users who might want to test its limits and discover its flaws. The team attempted to capture the most problematic examples of ChatGPT and use them to optimize future versions of the model.

Sandhini Agarwal : We have many next steps. I firmly believe that the virality of ChatGPT will bring to the surface and become more urgent many problems that we know and desperately want to solve.

For example, we know that the model is still biased. Yes, ChatGPT is very good at rejecting bad requests, but it can also be easily influenced by prompt words to only accept those requests.

Liam Fedus : It’s exciting to watch the rich and innovative use cases that users provide, but we’re always looking at areas for improvement. We believe that through an iterative process of deploying, getting feedback, and improving, we can produce the most desirable and functional technology. As our technology continues to evolve, new problems are always inevitable.

Sandhini Agarwal : In the weeks since ChatGPT was released, we looked at several of the worst cases that users found, and I mean the worst that people can see. We initially assessed each case and discussed how to fix it.

Jan Leike : (Those cases) were sometimes events that were widely circulated on Twitter, and some people chose to contact us privately.

Sandhini Agarwal : We found that many problems are actually the above-mentioned jailbreak behavior, which we urgently need to solve. However, since users have gone to great lengths to get ChatGPT to say dirty words, it’s not that we’ve ignored it before, and we’re not too surprised.

Nonetheless, this is something we are currently actively addressing. As we discover jailbreaks, we add them to our training and test data. All the data we see becomes part of future models.

Jan Leike : Whenever we have a better model, we want to put it out for testing. We confidently believe that with some targeted adversarial training, the jailbreaking situation can be greatly improved.

It’s not clear if these issues will go away completely, but we think we can make the jailbreak more difficult.

Again, the possibility of a jailbreak wasn’t unknown to us prior to release.

It’s just that I think once you deploy it, it’s hard to predict which behaviors will become security risks. So we’re focusing on monitoring what people are using the system for, seeing what happens and then responding to it.

It’s not that we don’t take the initiative to solve problems. But when a system is connected to the real world, we cannot foresee all possible situations.

In January of this year, Microsoft announced Bing Chat, a search chatbot that many believe to be OpenAI’s unannounced GPT-4 version (OpenAI says Bing is powered by our next-generation model, Microsoft specifically for search scenarios for customization. It combines the advantages of ChatGPT and GPT-3.5).

The text and pictures in this article are from APPSO

loading.gif

This article is transferred from https://www.techug.com/post/openai-team-conversation-record-chatgpt-is-cool-but-there-are-many-problems-exposed204cc31689337f4b0492/
This site is only for collection, and the copyright belongs to the original author.