Hi! Great to see the progress in Sentis since the last time I used it around a year ago. I’ve now started work on a large project using it.
However, I am unable to import my uint8 pre-quantized ONNX models, as the DequantizeLinear, DynamicDequantizeLinear and QuantizeLinear operations are (no longer) supported by Sentis! I DEFINITELY used models with these operators in an earlier version (1.3.0-pre.2), but for some reason support has been removed (indeed, even the docs for 1.3.0-pre.3 list these operators as unsupported). Why is this?
I know Sentis has its own model quantisation ability now, and that’s great. However, quantisation is a complex thing, and a whole research subfield of its own: I’m not sure which method Sentis is using (would be nice if the docs mentioned it!), but I suspect it is either dynamic quantisation (as there is no way to provide calibration data) or the weights are simply dequantised when loaded into memory. Either way, it certainly isn’t providing calibrated static quantisation, which is what I want - this is substantially faster than dynamic (33% faster in my experiments in Sentis back when it was supported). Static quantisation requires the currently unsupported DequantizeLinear and QuantizeLinear operations (or the alternative format of QLinearConv etc. which are also unsupported).
Advantages to supporting pre-quantised ONNX models:
- There’s lots of quantised ONNX models available on HuggingFace - it’s a lot easier to download one of these than to download the full version and quantise it yourself! They will also likely perform better.
- Sentis quantisation requires importing the ONNX file, then quantising locally, both very intensive operations (particularly with regard to RAM). For really large models this might be impossible for most folks due to limited RAM - it’s more feasible to quantise a big model on some workstation and download the resulting small ONNX file.
- Quantisation is sort of an art… there really are a lot of papers on how best to quantise neural networks (particularly for static quantisation, which is the most efficient so v relevant for in-game stuff). For big models it can make a huge difference. I have a setup on a workstation where I quantise models in different ways and compare their speed/accuracy, so I can select the one which gives the best speed/accuracy trade-off, and I suspect there are quantised models on HF which are similarly carefully crafted.
- It was previously supported…? So surely shouldn’t be hard to add it back?
Please let me know if I’ve made a mistake, or there’s some workaround which lets me use my uint8 quantised ONNX files in Sentis. At the moment it seems my only option is to import the full version and quantise locally, which does work fairly well, but my pre-quantised ONNX files will be faster and/or more accurate as they have been statically quantised, calibrated to the dataset and benchmarked!
Supporting DequantizeLinear, QuantizeLinear and DynamicDequantizeLinear is really important in my opinion.
Cheers,
Z
EDIT: I am using Unity 6.0 and Sentis 2.1.1, but I guess it doesn’t matter much as the operators are listed as unsupported in every version on the docs