Since i was desperate looking to contact some amd people on my thought/idea as i didn't wanted it to be overlooked as it could give some interesting performance gains to APU's if i am right.
i was being pointed from some Techtubers to the AMD Red Team discord, and from there, onto here. so here i am.
i will try to keep my post organized where i can below:
With limiting RAM bandwidth increases, hurting APU's pretty hard as of these days, and other interconnects reaching current RAM bandwidth speeds, or even exceeding them, APU's don't benefit from using system RAM for GPU video buffers as much anymore for higher resolutions or more graphic fidelity.
PCIe 4 specs compared to DDR4 in bandwidth:
So i was thinking aloud of what the PCIe 4.0 standard really brings to the table in performance.
Is short, a 16 lane configuration is upto 64GB bandwidth per second, equalling to roughly dual channel DDR4 memory at 4000MHz on it's own.
So in theory if you use dual channel DDR4 memory at 4000MHz, and some other usecase can handle a different memory pool using the PCIe 4.0 interface, that would be very nice, right?
My rough idea:
Then i came to the realization a possibilty of having a PCIe 4.0 enabled AMD Zen baed APU, with the graphics part of the chip using the PCIe 4 interface to connect to a (maybe low profile) memory pool/buffer (or both?) using a modified High Bandwidth Cache Controller (HBCC for short) to offload the system RAM in memory sensitive tasks like games.
This essentially can mean a add-in PCIe 4.0 memory cardcan be easily put into a small form factor PC, giving a (customizable?) video buffer interface of max 64GB per second in addition to say DD$4 running at 3200MHz (51GB/s). this would offload the RAM controller alot, especially in ram and vram heavy tasks (full HD gaming and beyond).
It could be another good way marketing to push for the PCIe 4.0 standard, making it easier to justify the mainboard cost price increase, and give a more realistic use of PCIe 4.0 for APU gamers at higher fidelity/resolutions.
So imagine a scenario like this:
A 65W TDP AMD APU with:
6 to 8 Zen2 cores (1 chiplet) at 3.2GHZ (consuming around 35watts at those clocks)
A Navi based graphics chiplet consuming around 35watts max.
With another 12watts for a slightly new io chip to allow use of the main PCIe 4.0 16 lanes being used as video memory if present in the PC.
2x16GB DDR4 3200 dual channel (= 51GB/s for game/driver/other use for memory)
a card of say 8GB video memory in the main PCIe 4.0 x16 slot for just video memory (64GB/s, pixel buffers, textures, and so on)
This would allow consumers to pop in a different sized video buffer at 64GB transfer speed max (example: 4GB to 8 GB), freeing up system RAM consumption, system RAM bandwidth and the like
I know this might sound like somewhat science fiction for now, but it shouldnot impossible at all.
Now for PCIe 5.0, 128GB per second vram speeds? that would easily enable medium graphics 1440p gaming with more than 60fps on a single desktop APU in the future, all on a APU package of around 65W.
I know it is unlikely to happen soon, and i even do not know if there will be too much latency.
Also i do not know the overhead translateing pcie comms to memory interface doing so, signal delay and such, as i am by no means electrical engineer.
And finally, as on liner, assuming this is a doable idea:
Creating the ability of using the main 16 PCIe 4.0 lanes to be used as video buffer interface for APU's.
Tell me what your opinions are on this, is it feasible, it it hard to achieve, or do you have a different view/take on such a thing? Post below
Thought about it some time ago when Vega HBCC was announced.
Better the PCIe interface to the additional memory is on the GPU itself.
PCIe4.0 does offer more bandwidth but it has already been done, using HBCC with PCIe3.0 on professional GPUs.
the current HBCC usecase uses the main DDR RAM as extra video buffer if dedicated gpu buffer is full.
what i am suggesting is a different approach:
trying to use a PCIe 4.0 vram buffer, to offload the main DDR RAM interface which is the only current (and limiting) option for APU uses (gpu's of any kind love bandwidth, and offloading video stuff from main DDR4 RAM could help alot, especially with larger screensizes).
As right now the DDR4 RAM bus is used for all of these in APU's: OS&device driver, running programs, graphics buffer, background services.
Imagine running 2 channel configuration of say 2800MHz main DDR4 speed for an APU, you have roughly 32GB/s max for all those items.
using a seperate PCIe 4.0 card with another 32GB/s bidirectional for a graphics buffer alone, that would offload the system RAM alot, and could in theory yield upto a doubling in memory capacity (2x 32GB/s = 64GB/s), albeit for 2 different purposes by the APU.
The HBCC could be implemented still for cases one puts too little VRAM for their games in the PCIe 4.0 slot to use DDR4 as well then needed.
This option really gives more flexibility in setup choice:
With PCIe dedicated VRAM buffer, for more performance, or use system RAM as GPU buffer as well for more cost effective options.
Those Radeon Pro SSG cards have two PCIe 3.0 M.2 slots allowing you to add up to 1TB of NAND flash memory directly to the graphics card.
"This will allow large datasets to be worked on locally without having to use system memory, which should provide a much faster experience during heavy rendering work."
I think it is essentially the same idea, if those PCIe 3.0 interface is upgraded to PCIe.4.0, you just get more bandwidth.
The only difference is you want the memory pool on the PC motherboard.
The GPU would have to access the additional PCIe SSD memory pool on the PC motherboard which is likely to be slower than having the interface directly on the GPU.
let me phrase it differently what i mean:
if you take a pcie gpu, and essentially just move the gpu die onto the cpu package, leaving the gpu memory connections run through the pcie bus. (with fallback to ddr ram if no such pcie card is present)
that is essentially what i mean