General Matrix Multiplication (GEMM) operation that computes C = A × B^T. This is a fundamental linear algebra operation used in neural networks for layer computations, attention mechanisms, and other matrix transformations. Variants:Documentation Index
Fetch the complete documentation index at: https://bench.flashinfer.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
- FP16 GEMM: Uses 16-bit floating point (FP16) inputs for A and B matrices
- FP8 GEMM: Uses 8-bit floating point (FP8) inputs for A and B matrices, with scaling factors to maintain numerical stability
M: variableN,K: constant
A: [M, K]B: [N, K]- Scaling factors for FP8 GEMM:
A_scale: [M]B_scale: [N]
C: [M, N]

