Optional
baseBase endpoint url.
Optional
cacheOptional
callbackOptional
callbacksOptional
disableWhether to disable streaming.
If streaming is bypassed, then stream()
will defer to
invoke()
.
Optional
frequencyNumber between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.
Optional
friendliFriendli team ID to run as.
Optional
friendliFriendli personal access token to run as.
Optional
maxThe maximum number of concurrent calls that can be made.
Defaults to Infinity
, which means no limit.
Optional
maxThe maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Optional
maxNumber between -2.0 and 2.0. Positive values penalizes tokens that have been
sampled at least once in the existing text.
presence_penalty: Optional[float] = None
The maximum number of tokens to generate. The length of your input tokens plus
max_tokens
should not exceed the model's maximum length (e.g., 2048 for OpenAI
GPT-3)
Optional
metadataOptional
modelModel name to use.
Optional
modelAdditional kwargs to pass to the model.
Optional
onCustom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Optional
stopWhen one of the stop phrases appears in the generation result, the API will stop generation. The phrase is included in the generated result. If you are using beam search, all of the active beams should contain the stop phrase to terminate generation. Before checking whether a stop phrase is included in the result, the phrase is converted into tokens.
Optional
tagsOptional
temperatureSampling temperature. Smaller temperature makes the generation result closer to
greedy, argmax (i.e., top_k = 1
) sampling. If it is None
, then 1.0 is used.
Optional
topPTokens comprising the top top_p
probability mass are kept for sampling. Numbers
between 0.0 (exclusive) and 1.0 (inclusive) are allowed. If it is None
, then 1.0
is used by default.
Optional
verbose
The ChatFriendliParams interface defines the input parameters for the ChatFriendli class.