Unlocking Peak Performance on Qualcomm NPU with LiteRT



This content originally appeared on Google Developers Blog and was authored by Google Developers Blog

LiteRT’s new Qualcomm AI Engine Direct (QNN) Accelerator unlocks dedicated NPU power for on-device GenAI on Android. It offers a unified mobile deployment workflow, SOTA performance (up to 100x speedup over CPU), and full model delegation. This enables smooth, real-time AI experiences, with FastVLM-0.5B achieving over 11,000 tokens/sec prefill on Snapdragon 8 Elite Gen 5 NPU.


This content originally appeared on Google Developers Blog and was authored by Google Developers Blog