cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

MicahVillmow
Staff
Staff

Maximum 1D array

The 8192x8192 limitation is hardware and should be different with future hardware.
0 Likes
bubu
Adept II

Maximum 1D array

Originally posted by: MicahVillmow The 8192x8192 limitation is hardware and should be different with future hardware.


Could be possible to emulate it fast and automatically in the driver? I really need to see all the graphics card's VRAM as a linear 1D pointer for the current HW generation... seriously.

Once this is done, I would like to see a Firestream with 4Gb, like the NVIDIA Tesla C1060. The mesh data I need to manage can occupy more than 2Gb... that or use a virtual VRAM memory system through the AGP/PCI like the CPU does.... I think Vista can virtualise the VRAM.

0 Likes
MicahVillmow
Staff
Staff

Maximum 1D array

bubu,
It is possible to get access to all the vram inside a kernel, but this requires using CAL, IL, and the global buffer(this gives you access to 32 bits of address space). There are problems with this, but these are operating system limitations mainly dealing with the mapping of graphics vram into the system pci memory space. There is usually a limit of around 200mb set by the driver/os that can be mapped directly at one point.

Memory management of the devices vram is left to the application writer as a single policy enforced by the driver is not ideal for all workloads.
0 Likes
bubu
Adept II

Maximum 1D array

Originally posted by: MicahVillmow bubu, It is possible to get access to all the vram inside a kernel, but this requires using CAL, IL, and the global buffer(this gives you access to 32 bits of address space).


 

But... can I use CAL to allocate the buffer and, then, pass the pointer to Brook+? I don't like the idea to program kernels in pseudo-assembly language. For example, this is what I want:

 

byte* dataIn = (byte*)calAllocateBuffer(2*1024*1024);

byte* dataOut = (byte*)calAllocateBuffer(1024);

 

void myBrookPlusKernel ( byte *ptr )

{

    const byte dIn = ptr[1800000000];

    dataOut[threadId] = dIn+10;

}

0 Likes
MicahVillmow
Staff
Staff

Maximum 1D array

bubu,
This can be done using the C++ interface, but i'm not sure about the C interface. As I am more of a CAL engineer and not Brook+.
0 Likes
udeepta
Staff
Staff

Maximum 1D array

It is possible to write a kernel in Brook+, convert it to IL using the Brook+ compiler (brcc), and use the generated IL in CAL. Or better yet, put in the Brook+ kernel in ShaderAnalyzer and get the IL from there.

It is not as straightforward as what you asked, but it is a good middle ground where you get all CAL functionality and the ease of Brook+ kernel programming. 

0 Likes
bubu
Adept II

Maximum 1D array

Originally posted by: udeepta@amd It is possible to write a kernel in Brook+, convert it to IL using the Brook+ compiler (brcc), and use the generated IL in CAL. Or better yet, put in the Brook+ kernel in ShaderAnalyzer and get the IL from there.


 

Ok... so imagine I need to access 256Mb of VRAM in a 512Mb Radeon.

For example, let's fill that data with a value of 0.5f ( I know, a dumb kernel ).

I should do this in Brook+ ( just a pseudo code 😞

void kernel ( float buff[] )

{

   buff[(blockIdx.x*blockDim.x)+threadId.x] = 0.5f;

}

Then I compile it using Shaderanalyzer or whatever... so the AMD IL assembler is generated.

 

Then, In my C++ program I do:

 

//initialize CAL

...

//Allocate 256Mb using CAL

void *ptr = calMalloc(256*1024*1024);

 

//Load the kernel compiled and pass the VRAM pointer as IL input.

handle = calLoadKernel("c:\test\myKernel.il");

calSetInput(handle,"buff",ptr);

 

 

0 Likes
bubu
Adept II

Maximum 1D array

Is that the way, pls?

0 Likes
jean-claude
Journeyman III

Maximum 1D array

Hi Bubu,

Your point is a smart one.

Writing kernel in Brook in order to get an assembly program and then using CAL for overall memory management seems to me the best compromise.

What is needed from AMD is a short application note (les than one page) to provide guidance on how to proceed.

Could a support guy from AMD take this as an action?

I'm sure this would be of great (and easy) help to overcome some of the current Brook limits.

Thanks

Jean-Claude

0 Likes
MicahVillmow
Staff
Staff

Maximum 1D array

Jean-Claude,
One of the problem with that approach is that the code generated from brook requires a lot of setup and mapping that the brook+ runtime needs but you would not need if you were writing in CAL. Also the code that is generated is fairly difficult to read and there is a simpler approach that I used to use here at work.
The approach is this:
1) Break down all math instructions into the simplest form possible. I.e. (x = cos((y * z)/4)), make it x = y * z; x = x / 4; x = cos(x)
This allows for an almost 1-to-1 mapping to IL for all math instructions. If something you want to do is not possible in IL, write up a quick hlsl shader and paste it into GPU shader analyzer and see the sequence of instructions that are generated.
2) For all flow control statements, break the conditional statements into simpler statements and store the results in a variable.
i.e. if(x == 0 || (y * z) > w) ==> x_cond = (x == 0); y = y * z; y_cond = (y > w); if_cond = (x_cond || y_cond); if (if_cond == 0) ...
This is how you would write the code in IL, and you can debug it at a higher level and verify correctness and then do a 1-1 mapping to IL instructions
3) move simple if statements into cmov_logical instructions. i.e. if(conditional) a = b else a = c ==> cmov_logical a, conditional, b, c
4) Only use while(1) statements, instead of easier flow control, and place a if (if_cond == 0) break or if(if_cond == 0) continue. these translate easily to break_logicalz if_cond or continue_logicalz if_cond il instructions

If you follow these guidelines, you can translate any brook+ source code that has been tested and debuged into CAL/IL fairly easily with almost no major issues outside of typo's or selecting the wrong instruction.
0 Likes