ONNX Runtime
ONNX Runtime optimizes ONNX models with quantization, pruning support, and hardware acceleration for cross-platform deployment.
Visit ONNX Runtime →ai machinelearning optimization deployment models
Want to know if ONNX Runtime fits your workflow?
Audit My AI ToolkitSimilar Tools in Model Compression
Qualcomm's Neural Processing SDK provides tools for model compression through quantization and pruning, optimized for...
TensorFlow's Model Optimization Toolkit offers APIs for pruning, quantization, and clustering to reduce model size an...
PyTorch's built-in quantization module enables post-training and quantization-aware training for INT8 and FP16 to com...
NVIDIA TensorRT is a high-performance deep learning inference optimizer and runtime supporting quantization, pruning,...