OpenAI has developed a Reward Based on the Rules (RBR), a new approach to improve the safety and efficiency of the models of the language. This is an approach that aims to align the behavior of IT to the standards you want to be a security, making use of a self and HE had no need for the erection of a large database of human.
The announcement of the OpenAI next ricaktimit of Aleksander Madry, one of the executives, leading the security of IT. The move has raised questions about the safety and security priorities of the company, led by Sam Altman, taking into account the role of the art of Madry. The company said it Madry now, I will focus on a project focused on the improvement of the reasoning of the models, HE said.
The difference between the RLHF, and RBR
Traditionally, the teaching of, and the provision of feedback to the human (RLHF) has been the method most used to ensure that the patterns of the English language to follow the instructions and comply with the instructions for safety and security. However, the research OpenAI present RBR-as an alternative to a more efficient and flexible. The reward based on the rules of the use of a set of rules to be clear and to the graduara to be assessed and addressed to the responses of the model, and to ensure that safety standards are met.
RBR-have been developed to address the problems of using only the responses of the human, which can be very expensive, requiring time and are subject to stereotypes. Having broken down the behaviors you want in the specific rules, RBR-to provide for the control of matter on the responses of the model. The following rules are used, then the train is a “model for the reward,” which runs IT, thus signaling to the actions desired by the insured the interaction of a safe and respectful manner.
The 3 categories of behavior
The three categories of behavior you want from the model when it has to do with the topic of malicious or sensitive, there are: the Refusal of the difficulties, Declines in the low-and the Match. The refusal of the grave shall include a waiver of a short, and a statement of inability to comply. The remains of the soft offer an answer to the most nuanced.
For example, if a user makes a request to the unethical as well as injury to a person, the type that HE could give an answer like, “I understand you may be angry, but the injury of others, it is never the solution. Why don't we try to talk about it in a constructive way about it as you zemëroi?” In this way, the HE refuses to gently request the original, but with a sensitivity and suggest that the choice of the positive. The category of “Compliance” requires that the model is to provide a response, in accordance with the request of the user, to keep still in the safety instructions.
The pros and cons of a Reward based on the rules of OpenAI
In the experiments, the models trained with the RBR showed better performance of security than those trained with the emotions of the human being is reduced also the case of non-acceptance of the erroneous claims to be safe and secure. RBR-also reduce significantly the need for large amounts of data in the human, making the training process faster and cheaper.
However, as the RBR-of-work well on the task, with the rules to be clear, the implementation of their tasks to the more subjective, such as writing an essay, it can be challenging. However, the combination of the RBR and the response of the human can balance these challenges by following specific instructions, and addressing aspects of the nuanced decisions in contributing to the human.
Discussion about this post