Is the maximum posible size of an 1D stream just 2^23? Well, I need to allocate a 1D-linear array with almost 384Mb... any idea how to perform this, pls?
thx
so... it's not possible then?
Well I prefer to use brook+ and not CAL because I don't like the idea to work with low-level shaders. Do you know if brook`is going to extend the 1D arrays to, let's say, the full graphics card's VRAM? In CUDA I can allocate a really big 1D array without problems...
I see in the 1.2 docs that the maximum 1D array has 8192 elements ( 64M with virtualisation )... but I need to allocate 300 or 400Mb of linear data...
With CUDA I can use all the VRAM if it's needed ( using device memory pointers )... is there any way to allocate, for example, 500Mb of VRAM in brook+, pls? If not... when is this going to be supported?
You can allocate a 1D array in C/C++, pass the pointer to a 2D stream in Brook+. Inside the kernel, you can reconstruct the 1D index (index = row*width+column). The maximum 2D array allowed in Brook+ is 8192x8192. We are considering increasing this limit in future releases.
Originally posted by: udeepta@amdWe are considering increasing this limit in future releases.
Yes, please... 64Mb is not enough. I just hope that won't be a hardware limit
For a texture 8kx8k can be enough... but, in my case, I need to store a really big tree in VRAM.
We hear ya.
In the mean time, will your algorithm work if you break the large linear data into a few smaller segments? The performance impact will not be very high. But i agree the limitation is something we could do without.
Originally posted by: udeepta@amd
In the mean time, will your algorithm work if you break the large linear data into a few smaller segments?
It's for raytracing a 15M poly model using a kd-tree in the GPU ( I need a total of 700Mb for that... which should be ok using a 1Gb card ). I could be break the kernels in parts... but it gonna be too difficult, so I think I'll just wait until you remove the 64Mb limitation. I need to polish some things meanwhile.
Btw... other question... can be the StreamSDK's arrays virtualized like it's done with the system RAM? For example... imagine the graphics card the user is using has only 512Mb... Can be the other 256Mb I need got from the AGP/PCI memory?
thx.
Originally posted by: MicahVillmow The 8192x8192 limitation is hardware and should be different with future hardware.
Could be possible to emulate it fast and automatically in the driver? I really need to see all the graphics card's VRAM as a linear 1D pointer for the current HW generation... seriously.
Once this is done, I would like to see a Firestream with 4Gb, like the NVIDIA Tesla C1060. The mesh data I need to manage can occupy more than 2Gb... that or use a virtual VRAM memory system through the AGP/PCI like the CPU does.... I think Vista can virtualise the VRAM.
Originally posted by: MicahVillmow bubu, It is possible to get access to all the vram inside a kernel, but this requires using CAL, IL, and the global buffer(this gives you access to 32 bits of address space).
But... can I use CAL to allocate the buffer and, then, pass the pointer to Brook+? I don't like the idea to program kernels in pseudo-assembly language. For example, this is what I want:
byte* dataIn = (byte*)calAllocateBuffer(2*1024*1024);
byte* dataOut = (byte*)calAllocateBuffer(1024);
void myBrookPlusKernel ( byte *ptr )
{
const byte dIn = ptr[1800000000];
dataOut[threadId] = dIn+10;
}
It is possible to write a kernel in Brook+, convert it to IL using the Brook+ compiler (brcc), and use the generated IL in CAL. Or better yet, put in the Brook+ kernel in ShaderAnalyzer and get the IL from there.
It is not as straightforward as what you asked, but it is a good middle ground where you get all CAL functionality and the ease of Brook+ kernel programming.
Originally posted by: udeepta@amd It is possible to write a kernel in Brook+, convert it to IL using the Brook+ compiler (brcc), and use the generated IL in CAL. Or better yet, put in the Brook+ kernel in ShaderAnalyzer and get the IL from there.
Ok... so imagine I need to access 256Mb of VRAM in a 512Mb Radeon.
For example, let's fill that data with a value of 0.5f ( I know, a dumb kernel ).
I should do this in Brook+ ( just a pseudo code 😞
void kernel ( float buff[] )
{
buff[(blockIdx.x*blockDim.x)+threadId.x] = 0.5f;
}
Then I compile it using Shaderanalyzer or whatever... so the AMD IL assembler is generated.
Then, In my C++ program I do:
//initialize CAL
...
//Allocate 256Mb using CAL
void *ptr = calMalloc(256*1024*1024);
//Load the kernel compiled and pass the VRAM pointer as IL input.
handle = calLoadKernel("c:\test\myKernel.il");
calSetInput(handle,"buff",ptr);
Is that the way, pls?
Hi Bubu,
Your point is a smart one.
Writing kernel in Brook in order to get an assembly program and then using CAL for overall memory management seems to me the best compromise.
What is needed from AMD is a short application note (les than one page) to provide guidance on how to proceed.
Could a support guy from AMD take this as an action?
I'm sure this would be of great (and easy) help to overcome some of the current Brook limits.
Thanks
Jean-Claude
To use IL with CAL, you will need to do some set up and mapping, as Micah explained. The CAL samples in the SDK show examples of that.
And you can go from Brook+ to IL easily. The issue is that brcc output IL is not compact, so the mapping in CAL can get murky, and I would not go that route now.
But we are working on cleaing up the brcc output IL for the next release -- am hoping that will help.
Thanks Micah,
I'll have a look but honestly this sounds a little bit quite an heavy process!
Jean-Claude