Token sampling operations for language model generation. These methods select the next token from a probability distribution, implementing various strategies to balance between diversity and quality in text generation by filtering and sampling from the model’s output probabilities. Variants:Documentation Index
Fetch the complete documentation index at: https://bench.flashinfer.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- Top-k sampling: Keeps only the k highest probability tokens, renormalizes the distribution, then samples. Controls diversity by limiting the vocabulary size to the most likely tokens
- Top-p sampling: Filters tokens using cumulative probability threshold (nucleus sampling). Dynamically adjusts vocabulary size based on probability mass, maintaining diversity while avoiding low-probability tokens
- Top-k + Top-p sampling: Combines both filtering methods for fine-grained control over generation quality and diversity
batch_size: variablevocab_size: constant
probs: probability distributions after softmax [batch_size, vocab_size]- Sampling-specific parameters:
top_k: for top-k sampling [batch_size]top_p: for top-p/nucleus sampling [batch_size]
samples: sampled token indices [batch_size]

