OptionaldpoConfiguration for the DPO fine-tuning method.
OptionalreinforcementConfiguration for the reinforcement fine-tuning method.
OptionalsupervisedConfiguration for the supervised fine-tuning method.
The type of method. Is either supervised, dpo, or reinforcement.
The method used for fine-tuning.