The FriendliParams interface defines the input parameters for the Friendli class.

interface FriendliParams {
    baseUrl?: string;
    cache?: boolean | BaseCache<Generation[]>;
    callbackManager?: CallbackManager;
    callbacks?: Callbacks;
    concurrency?: number;
    frequencyPenalty?: number;
    friendliTeam?: string;
    friendliToken?: string;
    maxConcurrency?: number;
    maxRetries?: number;
    maxTokens?: number;
    metadata?: Record<string, unknown>;
    model?: string;
    modelKwargs?: Record<string, unknown>;
    onFailedAttempt?: FailedAttemptHandler;
    stop?: string[];
    tags?: string[];
    temperature?: number;
    topP?: number;
    verbose?: boolean;
}

Hierarchy

  • BaseLLMParams
    • FriendliParams

Properties

baseUrl?: string

Base endpoint url.

cache?: boolean | BaseCache<Generation[]>
callbackManager?: CallbackManager

Use callbacks instead

callbacks?: Callbacks
concurrency?: number

Use maxConcurrency instead

frequencyPenalty?: number

Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.

friendliTeam?: string

Friendli team ID to run as.

friendliToken?: string

Friendli personal access token to run as.

maxConcurrency?: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

maxRetries?: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

maxTokens?: number

Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text. presence_penalty: Optional[float] = None The maximum number of tokens to generate. The length of your input tokens plus max_tokens should not exceed the model's maximum length (e.g., 2048 for OpenAI GPT-3)

metadata?: Record<string, unknown>
model?: string

Model name to use.

modelKwargs?: Record<string, unknown>

Additional kwargs to pass to the model.

onFailedAttempt?: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

stop?: string[]

When one of the stop phrases appears in the generation result, the API will stop generation. The phrase is included in the generated result. If you are using beam search, all of the active beams should contain the stop phrase to terminate generation. Before checking whether a stop phrase is included in the result, the phrase is converted into tokens.

tags?: string[]
temperature?: number

Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1) sampling. If it is None, then 1.0 is used.

topP?: number

Tokens comprising the top top_p probability mass are kept for sampling. Numbers between 0.0 (exclusive) and 1.0 (inclusive) are allowed. If it is None, then 1.0 is used by default.

verbose?: boolean