I have some IC models i fine-tune in TF2.X.
I have exported them to ONNX format (tried both opset=13 and 17).
I understand Quark is the recommended flow, as the Vitis AI Quantiser is getting depreciated?
So used Quark (0.8rce3) to compile the models using different configurations (XINT8, INT8_CNN_ACCURATE, XINT8_ADAROUND, XINT8_ADAQUANT) with and without cle (only works with optimize_model = False).
1st issue regarding accuracy:
Testing the onnx models on CPU/iGPU, in best case scenarios across all configs all of my models showed up to 5% loss, but ResNet50V1 went to 30-40% from 89% in all cases, tried different configs, but nothing helped.
Deploying on NPU, results were random.
DenseNet121/DeneNet169, showed % less that 20%.
DenseNet201/MobileNet showed no % loss (very weird as DenseNet121/169 are similar architecture)
MobileNetV2/Xception showed 1 to 4% loss
ResNet50V1 went from 40% to 25%.
everything is using same calib data, and pre-processing, so not sure why the randomness.
2nd issue is with regards to inference:
for iGPU, the inference was same as the CPU and seen no activity on it.
for NPU, only seen improvements on MobileNetV2, ResNet50V1 and Xception models, the rest models the performance of NPU was same as CPU, and seeing the messages model fallbacked on CPU.
I tried https://github.com/amd/RyzenAI-SW/blob/main/tutorial/yolov8/yolov8_python/README.md, which model can be deployed on CPU, iGPU or NPU and with improvements on each target, but i cant find how this model was quantised and compiled.
Not sure if i am missing an important step or i done something wrong.
Would appreciate some feedback.