The hyperparameters used for the reinforcement fine-tuning job.

interface ReinforcementHyperparameters {
    batch_size?: number | "auto";
    compute_multiplier?: number | "auto";
    eval_interval?: number | "auto";
    eval_samples?: number | "auto";
    learning_rate_multiplier?: number | "auto";
    n_epochs?: number | "auto";
    reasoning_effort?:
        | "low"
        | "medium"
        | "high"
        | "default";
}

Properties

batch_size?: number | "auto"

Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.

compute_multiplier?: number | "auto"

Multiplier on amount of compute used for exploring search space during training.

eval_interval?: number | "auto"

The number of training steps between evaluation runs.

eval_samples?: number | "auto"

Number of evaluation samples to generate per training step.

learning_rate_multiplier?: number | "auto"

Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.

n_epochs?: number | "auto"

The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.

reasoning_effort?:
    | "low"
    | "medium"
    | "high"
    | "default"

Level of reasoning effort.