AI Performance for Consumers with Large Language Models (LLMs) in Ryzen™ AI

AMD_AI · ‎04-01-2024

Large Language Models (LLMs) are for everyone – not just those who know how to write and run code. AMD recently rolled out easy to follow guides on how to run state-of-the-art large language models on AI PCs featuring AMD Ryzen™ AI or AMD Radeon™ 7000 Series graphics card using LM Studio - without having any coding skills. Today, we are going to go over some x86 platform choices for AMD Ryzen™ AI versus our competitor and look at how the performance fares between the two in real-world workloads.

AMD Ryzen™ Mobile 7040 Series and AMD Ryzen™ Mobile 8040 Series processors feature a Neural Processing Unit (NPU) which is explicitly designed to handle emerging AI workloads. Featuring up to 16 TOPs, the NPU allows the user to execute AI workloads with maximum power efficiency. Check out this video to learn more about AMD Ryzen™ AI PCs.

Ryzen AI LLM Disclosure_Edit_FINAL 13.jpg

Entry-level x86-based AI PCs are available under the $1,000 price point from both AMD and its competitor.

An AMD Ryzen™ AI-equipped laptop, for example, costs $899* and a comparable x86 solution from the competition costs $999*. The AMD AI PC is more cost effective, has an OLED IMAX-enhanced screen with 2.8k resolution and a 120Hz frame rate. The competitor SKU only has a standard IPS panel with a 1.2k screen and 60 Hz frame rate. The AMD laptop also features twice the SSD capacity and a lower TDP at 15W while the competition has a much higher TDP of 28W.

So, what about performance? LM Studio is one of the most popular applications for consumers to use and deploy large language models and the AMD AI PC achieved higher performance* in our testing.

Mistral 7b is a very popular model and the AMD Ryzen 7 7840U 15W processor achieves up to 17% faster tokens per second with a specimen sample prompt over the competition[1]. The AMD Ryzen AI chip also achieves 79% faster time-to-first-token in Llama v2 Chat 7b on average[1]. AMD recommends a 4-bit K M quantization for running LLMs in an everyday use setting and 5-bit K M for tasks that need the utmost accuracy, like coding.

Ryzen AI LLM Disclosure_Edit_FINAL 7.jpg

The breakdown of how our performance (tokens-per-second and time-to-first token) compares to competitors in various quantization levels is also given. It is worth noting that Q8 and Q2 quantizations are not recommended by AMD since the former is very slow and the latter has a large perplexity loss. This is in-line with recommendations by industry peers (you can see quantization recommendations against perplexity loss with the command quantize --help in llama.cpp)[2].

We also tested the Llama v2 Chat 7b model and found similar results in both time-to-first-token and tokens-per-second:

Large language models can be incredibly helpful to increase productivity, and with Ryzen AI, you can now run them completely locally,

AMD is committed to advancing AI, making the benefits of AI pervasive. AI PCs from AMD enable everyone to benefit from the growth in AI consumer applications. Users have different x86 platform choices. However, AMD Ryzen AI ™ laptops are not only cost-effective, with next-level performance in consumer LLM applications like LM Studio, but they perform at half the TDP, and can have significantly better platform specifications resulting in a leadership value proposition for consumers.

Click here to learn more about AMD Ryzen ™ AI.

Footnotes:

Testing as of Feb 2023 by AMD. Sustained performance average of multiple runs with specimen prompt "Write me a story about an orange cat called mr whiskers". All tests conducted on LM Studio 0.2.16. Performance may vary. Market price retrieved on 3/4/2023 (Amazon, US). Phoenix: HP Pavilion Plus Laptop 14-ey0xxx, Ryzen 7 7840U 15W TDP, 16GB LPDDR5 6400, Windows 23H2 22631.3155, Adrenalin Driver 24.2.1. Meteor Lake: Acer Swift SFG14-72T, Intel Core Ultra 7 155H 28W TDP, 16GB LPDDR5 6400, Windows 23H2 22631.3155, Driver 31.0.101.5333. PHX-59.
Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied. GD-97.

* All tests conducted on LM Studio 0.2.16. Performance may vary. Market price retrieved on 3/4/2023 (Amazon, US)

Pricing (MTL) 999 USD and SKU features (Retrieved on 3/4/2024): https://www.amazon.com/Display-Experiences-Processor-LPDDR5X-SFG14-72T-718K/dp/B0CNDTGC77/

Pricing (PHX) 899 USD and SKU features (Retrieved on 3/4/2024): https://www.amazon.com/HP-Pavilion-120Hz-500Nits-Laptop/dp/B0CQ75B2MY/?th=1