I wonder if anybody could tell me if they've had any success allocating local resources larger than 255MB on a 64 bit linux system? I don't seem to be able to do this.
My system has a 512MB 3870, 4GB of system memory, Phenom 9850, 790FX motherboard, and is running 64-bit Scientific Linux 5.1 (clone of RHEL5.1) "out of the box" along with the 64-bit CAL 1.01 beta and provided driver.
I don't know if it's relevant, but poking around a bit I noticed from the Xorg.0.log that fglrx reports 262144 kB of videoram and an ATI GART Size of 255 MB. I tried changing the latter with aticonfig but it wouldn't let me change up (fglrx reports 255 is the max despite what aticonfig -h says) and changing it down didn't seem to lower the max resource I could allocate with CAL (though it did change some of the numbers returned with calDeviceGetStatus() and calDeviceGetAttribs()).
Steven, It seems that the 255 limit is the problem, you need 2k(width) * 8k(height) * 16bytes(float4) to get that many floats, which is 256MB. As for partitioning the data, performance-wise it is probably smarter to do so. If you look at throughput, inputspeed and outputspeed tests in the runtime folder of CAL, you will see that the optimal performance is not gained with a single input and output but with multiple inputs and outputs.
You are right that for performance reasons partitions seem to be the way to go.
However, I do see this restriction as a significant problem: First, it leads to bizarre failures that are at odds with the documentation on resource allocation and with the output from calDeviceGetInfo(). Second, one might simply want to check things work using one buffer before partitioning for correctness. Third, multiple global buffers are not allowed, so it looks like scatter really is limited to 255MB.
I am not an expert on drivers but if the limit can be raised it would be very helpful, particularly as cards become available with more and more memory.
Steven, I just tried a test that might help you get around this issue. The problem isn't with the allocation but with the mapping of the memory at least on my card. I have a 512mb card and I can allocate in a single chunk 312MB or 4096*5000*float4, however I cannot map that much as my GART space is not large enough. So, what you could try is to allocate it locally and instead of mapping it and writing to it, use a copy shader to copy chunks at a time to this memory location. As for the global buffers, they are restricted by the GPU memory size, you should be able to verify this on your own card but you should be able to allocate larger than a 256MB chunk. The problem is just going to be reading the data back, since you cannot map it, you will have to use copy shaders to copy the data or write straight to remote memory. This is a known issue however, not sure when it will get fixed since its a GART/driver issue and not a Stream SDK issue per se.
sgratton, I talked with one of the guys here that knows linux better than myself and he looked at the information you gave us. The problem is that your local memory is being split up into 2 256MB chunks with some being used by the X framebuffer and the second chunk being given to something else, maybe gart he thinks. So there is no block larger than 256MB to allocate.
"--) fglrx(0): VideoRAM: 262144 kByte, Type: DDR4 (II) fglrx(0): PCIE card detected (--) fglrx(0): Using per-process page tables (PPPT) as GART. ... (**) fglrx(0): ATI GART size: 255 MB (II) fglrx(0): [pcie] 261120 kB allocated (II) fglrx(0): [drm] DRM buffer queue setup: nbufs = 100 bufsize = 65536 "
Also, maybe try in your xorg config to change from Option "MaxGARTSize" "255" to maybe something larger and see if that helps.
Thanks for checking this out for me. I did try playing with maxgartsize before, but have just tried again, with the same results, as follows:
I tried lowering maxgartsize to 128 using the proper aticonfig tool as root, and this hasn't changed the problem: I can still allocate an 8192x8160 resource of float1's, but not an 8192x8161 one. Note that I can allocate two 8192x7000 resources of float1's, so I can use most of the memory of the card, if I use multiple resources.
I also tried raising maxgartsize to 512, given as the upper limit in "aticonfig -h". However, when I did this I got the following in /var/log/Xorg.0.log:
(**) fglrx(0): Illegal ATI GART size: 512 MB, acceptable range: 64 MB - 255 MB. (**) fglrx(0): ATI GART size: 255 MB (II) fglrx(0): [pcie] 261120 kB allocated
and, not surprisingly, got exactly the same results as before.
Perhaps you could ask one of the fglrx people to take a look sometime in case they can suggest a fix?