cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

sgratton
Adept I

255MB resource limit on 64 bit linux?


Hi there,

I wonder if anybody could tell me if they've had any success allocating local resources larger than 255MB on a 64 bit linux system? I don't seem to be able to do this.

My system has a 512MB 3870, 4GB of system memory, Phenom 9850, 790FX motherboard, and is running 64-bit Scientific Linux 5.1 (clone of RHEL5.1) "out of the box" along with the 64-bit CAL 1.01 beta and provided driver.

I don't know if it's relevant, but poking around a bit I noticed from the Xorg.0.log that fglrx reports 262144 kB of videoram and an ATI GART Size of 255 MB. I tried changing the latter with aticonfig but it wouldn't let me change up (fglrx reports 255 is the max despite what aticonfig -h says) and changing it down didn't seem to lower the max resource I could allocate with CAL (though it did change some of the numbers returned with calDeviceGetStatus() and calDeviceGetAttribs()).

Any suggestions much appreciated!

Thanks a lot,
Steven.
0 Likes
10 Replies

Steven,
It is possible to allocate more than 255MB, but not in one chunk because of GART memory restrictions, multiple allocations need to be made to allocate all the space on the card.

0 Likes


Hi Micah,

Sorry, I should have mentioned that. I wanted to allocate a big matrix (8192x8192 floats say) in one resource without having to partition it.

Is this a problem that is here to stay or will this restriction be lifted in the future? And does it affect all platforms?

Thanks,
Steven.
0 Likes

Steven,
It seems that the 255 limit is the problem, you need 2k(width) * 8k(height) * 16bytes(float4) to get that many floats, which is 256MB. As for partitioning the data, performance-wise it is probably smarter to do so. If you look at throughput, inputspeed and outputspeed tests in the runtime folder of CAL, you will see that the optimal performance is not gained with a single input and output but with multiple inputs and outputs.

Also, some information on doing optimizations can be found here:
http://coachk.cs.ucf.edu/cours...6938/PerfModeling.pdf

A presentation given to a university by two of our guys here. At the end they show how using smaller sizes outputs actually improve performance in certain cases/algorithms.
0 Likes


Hi Micah,

You are right that for performance reasons partitions seem to be the way to go.

However, I do see this restriction as a significant problem: First, it leads to bizarre failures that are at odds with the documentation on resource allocation and with the output from calDeviceGetInfo(). Second, one might simply want to check things work using one buffer before partitioning for correctness. Third, multiple global buffers are not allowed, so it looks like scatter really is limited to 255MB.

I am not an expert on drivers but if the limit can be raised it would be very helpful, particularly as cards become available with more and more memory.

Best,
Steven.
0 Likes

Steven,
I just tried a test that might help you get around this issue. The problem isn't with the allocation but with the mapping of the memory at least on my card. I have a 512mb card and I can allocate in a single chunk 312MB or 4096*5000*float4, however I cannot map that much as my GART space is not large enough. So, what you could try is to allocate it locally and instead of mapping it and writing to it, use a copy shader to copy chunks at a time to this memory location.
As for the global buffers, they are restricted by the GPU memory size, you should be able to verify this on your own card but you should be able to allocate larger than a 256MB chunk. The problem is just going to be reading the data back, since you cannot map it, you will have to use copy shaders to copy the data or write straight to remote memory. This is a known issue however, not sure when it will get fixed since its a GART/driver issue and not a Stream SDK issue per se.
0 Likes

0 Likes

sgratton,
I talked with one of the guys here that knows linux better than myself and he looked at the information you gave us. The problem is that your local memory is being split up into 2 256MB chunks with some being used by the X framebuffer and the second chunk being given to something else, maybe gart he thinks. So there is no block larger than 256MB to allocate.

"--) fglrx(0): VideoRAM: 262144 kByte, Type: DDR4
(II) fglrx(0): PCIE card detected
(--) fglrx(0): Using per-process page tables (PPPT) as GART.
...
(**) fglrx(0): ATI GART size: 255 MB
(II) fglrx(0): [pcie] 261120 kB allocated
(II) fglrx(0): [drm] DRM buffer queue setup: nbufs = 100 bufsize = 65536 "

Also, maybe try in your xorg config to change from
Option "MaxGARTSize" "255" to maybe something larger and see if that helps.
0 Likes


Hi Micah,

Thanks for checking this out for me. I did try playing with maxgartsize before, but have just tried again, with the same results, as follows:

I tried lowering maxgartsize to 128 using the proper aticonfig tool as root, and this hasn't changed the problem: I can still allocate an 8192x8160 resource of float1's, but not an 8192x8161 one. Note that I can allocate two 8192x7000 resources of float1's, so I can use most of the memory of the card, if I use multiple resources.

I also tried raising maxgartsize to 512, given as the upper limit in "aticonfig -h". However, when I did this I got the following in /var/log/Xorg.0.log:

(**) fglrx(0): Illegal ATI GART size: 512 MB, acceptable range: 64 MB - 255 MB.
(**) fglrx(0): ATI GART size: 255 MB
(II) fglrx(0): [pcie] 261120 kB allocated

and, not surprisingly, got exactly the same results as before.

Perhaps you could ask one of the fglrx people to take a look sometime in case they can suggest a fix?

Thanks again,
Steven.

0 Likes
rahulgarg
Adept II

Any updates on the issue? Its still a problem with Catalyst 8.12 and CAL 1.3.
The problem does not exist on Vista64 at least and is linux specific.
0 Likes


Hi there,

I'd be interested in knowing what the status here is too, and it's helpful to hear that things are better with vista. I hope to try cal 1.3 some time soon and might well try the vista version as well as linux in light of this.

To clarify what is mentioned above, I should say I did a bit more experimenting with cal 1.2 a while back but after I got a 1GB HD4870 card as well as the 512MB 3870 card discussed in the older posts. This gave me a clearer idea of the restrictions.

I think the situation is:

1. The maximum size single resource you can allocate is the memory of the card minus 256MB (so 256MB for a 512MB card, 768MB for a 1GB card).

2. You cannot map a resource larger than 255MB. (You can allocate it but initialize it in a different way, via a shader say as I think Micah suggested.)

All my attempts at playing with driver settings were unsuccessful in lifting either of these restrictions.

Best,
Steven.
0 Likes