Configuration for the reinforcement fine-tuning method.

interface ReinforcementMethod {
    grader:
        | StringCheckGrader
        | TextSimilarityGrader
        | PythonGrader
        | ScoreModelGrader
        | MultiGrader;
    hyperparameters?: ReinforcementHyperparameters;
}

Properties

grader:
    | StringCheckGrader
    | TextSimilarityGrader
    | PythonGrader
    | ScoreModelGrader
    | MultiGrader

The grader used for the fine-tuning job.

The hyperparameters used for the reinforcement fine-tuning job.