Is it too much to ask AMD to put RAM slots on their GPUs. I need to be able to extend the RAM to something like a terabyte to run deepseek r1 model. I know that it will be slow but not as slow as cpu processing. On cpu it goes to 4 tokens a second. I am hoping that if it were run on a gpu it can go to 20 tokens a second.
Please consider this. Adding the slots on the back of the gpu will not take up space if it were not used and will not intervene with the cooling on the other side of the gpu. But for people like myself this will make it very valuable as i can have very large models running locally at an acceptable price and token speed for my personal use.
Solved! Go to Solution.
And you know what, AMD makes CPUs, so all they really need to do is have the two departments of CPUs and GPUs talk to each other. I think it will be easier than expected because of AMD already having the experience in making CPUs, unlike Nvidia which makes GPUs only.
Adding more vRAM to a GPU card isn't going to do much if the actual GPU processor isn't very fast. It might help in some games playing in Ultra mode, as an example.
Plus adding more vRAM you might need to modify your GPU vBIOS to be able to access and recognize the extra added vRAM.
Also adding RAM sticks might interfere with other hardware in a tight PC case or with the CPU Cooler. Best way, in my opinion, is to have a GPU card where the vRAM chips are removable and upgrade-able like a CPU processor.
Also you need to consider if the GPU processor can handle more added vRAM.
I did read once where a User upgraded his GPU Card vRAM by desoldering the current vRAM chips and soldering higher density vRAM chips. But don't remember how that turn out if the GPU card work afterwards and if there was any improvement.
@Abdulrahman392011 wrote:Is it too much to ask AMD to put RAM slots on their GPUs. I need to be able to extend the RAM to something like a terabyte to run deepseek r1 model. But for people like myself this will make it very valuable as i can have very large models running locally at an acceptable price and token speed for my personal use.
Commercially available GDDR uses a BGA (ball grid array) interface, which is a JEDEC standard specification for soldered memory chips. There is absolutely NO standard for socketable GDDR memory, never has been.
You're asking AMD to manufacture their own unique memory chip design and make up some arbitrary socket interface to use on their graphics cards, not to mention the GPU core memory bus bit width required to address a terrabyte of GDDR, all so you can run a LLM on your "personal, acceptably-priced" desktop GPU. ROFL
Well, it will eventually get done. It's better if AMD has the patent for something like that. It will balance the competition. Nvidia is kinda overcharging everyone because of their CUDA patent.
All i am saying is, if AMD got on stage with a card in their hand saying this can run 600 billion parameter model with no quantization, and 20 tokens a second. people are gonna flip out. And it's better AMD than Nvidia on that stage. For us consumers, i mean. 20 tokens is faster than you can read. This basically means you can buy your very own openai. And in a couple of years when a new gpu comes out. Just put the memory sticks from the old gpu into the new one. Having better performance without actually needing to buy new memory. RAM doesn't get new generations as fast as GPUs. So every three gpu upgrades, upgrade the RAM. I've done the math and extrapolating the numbers the RAM should be 3500$ and the gpu should be about 1200$ to get 20 tokens a second or a bit more.
Just consider it guys.
Price estimates are a bit low.
The AMD MI300X has 192GB of memory and costs around $20,000. (with discounts for volume)
Typically run eight to a server, which would give you 1.5TB of memory at $160,000 plus the cost of the server.
DDR4 256 gigabytes server ram stick is about 1250$. If we put six slots on the GPU, this will be about 7000$ but AMD ordered in massive numbers for all the people that will be interested in this. it can get it for half that. Normal GPUs without soldered RAM should be priced lower. A 2000$ can be 1500$.
As for your example you didn't consider that those GPUs you're talking about are meant for server use. Server use is typically to serve multiple users at the same time. Meaning that it's output tokens is significantly higher than 20 tokens a second. That is why it so darn expensive.
@Abdulrahman392011 wrote:DDR4 256 gigabytes server ram stick is about 1250$. If we put six slots on the GPU,...
No company is going to put DDR4 memory slots on a graphics card.
If you want to utilize slow DDR4 from the graphics card just have it access the shared system memory. Graphics cards and operating systems already do this when dedicated graphics memory of the card is exceeded. Windows for example "reserves" up to half of the system memory for shared graphics use.
If they already access the RAM of the cpu then really this makes it easier as the protocol is already coded. However having the RAM sticks on the graphics card will increase the bandwidth as it will not need to use the pci-e to connect the gpu chip with the ram.
That being said your point is actually very useful as they can use the protocol to connect the gpu chip and the ram as a prototype and compare it with the cpu running the same process on the ram and see how much if any performance boost is achieved.
If it turns out to be actually beneficial then the possibility of bypassing the pci-e bottleneck by putting the ram on the gpu will enable having such large amount like 1.5 terabyte of ram being managed by the gpu without issues of bandwidth related to the pci-e connection
Also, I think the point here is that memory generation is 6 years on average while GPU generation is 2 years on average.
This means if the user wants to upgrade their GPU every generation they will have to lose the memory and buy the same memory again for the new gpu generation.
This association between memory and gpu is a bottleneck that prevents gpu from having a lot of memory on them. By enabling the user to pay for them separately will make the user able to invest in large amount of memory.
The implications of this too big to overlook.
Again all I am saying here is to consider it.
It's interesting to see the replies to your 'too much to ask' question, @Abdulrahman392011 . We have been getting more and more VRAM in our video cards, and that will likely continue as the need for it becomes a reality. I agree that having slots for plug-in memory would create all sorts of compatibility problems, but adding more RAM chips and widening the bus should be possible as the process sizes continue to shrink.
Didn't we start off the personal computer craze (late 1970's / early 1980's) with soldered in memory chips? I can't remember when the motherboards finally had sockets for plug-in memory.
We might be over thinking this whole thing. I mean it can't be that hard. They already done it for cpu. So it's not like I am asking for something that's never been done before "technically". Just look into the past and see how they done it for the cpu and repeat the same process for the gpu.
The CPU has access to an entire motherboard that's full of components - chipset, DDR RAM slots and voltage regulators. Oh, there's the storage slots and audio driver chips too. And let's see, graphics chips for the onboard HDMI and DP ports, USB ports and well....even ARGB interfaces. The GPU has memory and fans, along with a massive heatsink assembly.
And you know what, AMD makes CPUs, so all they really need to do is have the two departments of CPUs and GPUs talk to each other. I think it will be easier than expected because of AMD already having the experience in making CPUs, unlike Nvidia which makes GPUs only.
I believe that this would be a violation of the technology ban in the US with US companies.
Lol, they're doing their best but it's really up to us to create the demand and clarify our needs.