Documentation Index
Fetch the complete documentation index at: https://bench.flashinfer.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Entry Points
FlashInfer-Bench provides two equivalent command-line entry points:
flashinfer-bench --help
python -m flashinfer_bench --help
Use --help on any subcommand to inspect all available flags:
flashinfer-bench run --help
flashinfer-bench report --help
flashinfer-bench report summary --help
Run Benchmarks
Run benchmarks against a local FlashInfer-Trace dataset:
flashinfer-bench run --local /path/to/flashinfer-trace
This is equivalent to:
python -m flashinfer_bench run --local /path/to/flashinfer-trace
Useful options:
flashinfer-bench run --local /path/to/flashinfer-trace \
--warmup-runs 10 \
--iterations 100 \
--num-trials 5 \
--rtol 1e-3 \
--atol 1e-3 \
--timeout 300
Run only selected definitions or solutions:
flashinfer-bench run --local /path/to/flashinfer-trace \
--definitions gemm_n5120_k2048 rmsnorm_h128 \
--solutions solution_name_1 solution_name_2
Resume an interrupted run:
flashinfer-bench run --local /path/to/flashinfer-trace --resume
Use a YAML config file to set per-op-type or per-definition eval parameters:
flashinfer-bench run --local /path/to/flashinfer-trace --config my_config.yaml
Use the isolated runner instead of the default persistent runner:
flashinfer-bench run --local /path/to/flashinfer-trace --use-isolated-runner
Run The Benchmark Server
Start an HTTP benchmark server against a local trace dataset:
flashinfer-bench serve \
--local /path/to/flashinfer-trace \
--host 0.0.0.0 \
--port 8000
Use --devices to pin specific CUDA devices, or omit it to use all available CUDA devices.
For endpoint details and request/response examples, see Benchmark Server API.
Inspect Results
Summarize pass/fail counts and author rankings by average speedup:
flashinfer-bench report summary --local /path/to/flashinfer-trace
Show the best solution for each definition:
flashinfer-bench report best --local /path/to/flashinfer-trace
Merge multiple local datasets into one output directory:
flashinfer-bench report merge \
--local /path/to/trace-a \
--local /path/to/trace-b \
--output /path/to/merged-trace
Render a console-oriented visualization of results:
flashinfer-bench report visualize --local /path/to/flashinfer-trace
Notes
- The CLI supports local datasets via
--local.
- Log verbosity is controlled with
--log-level {DEBUG,INFO,WARNING,ERROR} on supported commands.
- The
flashinfer-bench console script and python -m flashinfer_bench share the same implementation and behavior.