OpenAI, ChatGPT’s Parent Company unveils a new model customisation technique- Reinforcement Fine-Tuning. This new technique allows users to create expert models based on their data set to suit their purposes specifically complex, domain-specific tasks. The OpenAI team unveiled it in their ongoing 12 Days, 12 Livestreams.
It aims to embolden the idea of making personal, domain-specific, and well-versed models tailored to fit the requirements emerging in a particular profession a dream coming true. This particular development is a significant stride in the way of the ultimate AGI model that the world is aspiring for, currently.
Today we previewed Reinforcement Fine-Tuning, a new model customization technique that enables organizations to build expert models for specific, complex tasks in domains such as coding, scientific research, or finance. pic.twitter.com/iPVtlxTO5C
— OpenAI (@OpenAI) December 6, 2024
How does it work?
This new model customization approach allows developers to tailor our models by training them on a diverse set of tasks, ranging from dozens to thousands of high-quality examples. By providing reference answers, developers can evaluate the model's responses and guide its reasoning process.
This method enhances the model's ability to tackle similar problems, improving its precision and performance within specific domains.
Learn and not mimic
The new technique introduced by the company aims to train the IA model to learn reasoning and develop reasoning skills catering to domain-specific expertise. This is a milestone in AI research as it drives LLMs out of mechanic exercise and trains them for the real stuff which is learning & reasoning.
How is reinforcement important?
Reinforcement plays a pivotal role here as this would enable the model to develop the ability to reason utilizing the learnings from a large data set and then behave accordingly. This even allows the model to trace back the path of reasoning for the right answer and disincentivize the wrong path creating an evident track to follow for future responses.
RFT in research
“Assessing rare diseases is kind of hard because you kind of have to have two things. You have to have a sort of expert domain knowledge about the medical side of things and you also have to have uh sort of systematic reasoning over the biomedical data and this is an area where we think that the o1 model can help us out with its,” said Justin, a computational Biologist with Berkeley lab.
Real-world usage
The domain-specific approach adopted by the company makes it highly appealing for sectors like law, Finance, Insurance, and Engineering. One real-world example that the company itself has tod in its announcement is its experimentation in assisting the legal consuls of Thomson Reuters in the legal aspects.