Optionalbatch_Number of examples in each batch. A larger batch size means that model parameters are updated less frequently, but with lower variance.
Optionalcompute_Multiplier on amount of compute used for exploring search space during training.
Optionaleval_The number of training steps between evaluation runs.
Optionaleval_Number of evaluation samples to generate per training step.
Optionallearning_Scaling factor for the learning rate. A smaller learning rate may be useful to avoid overfitting.
Optionaln_The number of epochs to train the model for. An epoch refers to one full cycle through the training dataset.
Optionalreasoning_Level of reasoning effort.
The hyperparameters used for the reinforcement fine-tuning job.