Optional
dpoConfiguration for the DPO fine-tuning method.
Optional
reinforcementConfiguration for the reinforcement fine-tuning method.
Optional
supervisedConfiguration for the supervised fine-tuning method.
The type of method. Is either supervised
, dpo
, or reinforcement
.
The method used for fine-tuning.