1) Not a double copy. When you map a remote resource, AFAIK the pointer is returned immediately and no copy is done.
2) Its an extension so you need to get it through the calExtGetProc.
(Note : I am not from AMD)
Thanks Rahul for your inputs,
With respect to "calCtxResCreate" I can't find it neither on table B1 in AMD's Stream-Computing document nor in cal_ext.h header file.
Actually the only extensions that are mentioned are:
CAL_EXT_D3D9, CAL_EXT_OPENGL, CAL_EXT_D3D10 and CAL_EXT_COUNTERS...
This is what brought me thinking of calCtxResCreate as a ghost API !!!
Maybe some folks from AMD can clarify the matter.
Could you make sure you are using 1.3beta header files. I can see CAL_EXT_RES_CREATE extension id.
Did some file cleanup on my PC... I should have done a long time ago!
Now, ok I found CAL_EXT_RES_CREATE in cal_ext.h.
For sure I'll have a look when I've extra time, BTW is there a few lines of documentation related to calCtxResCreate proper use?
You are probably looking for calResCreate2D in cal_ext.h
Yes tht's it.
OK, just give a quick trial to check if understood properly:
// First check if extension supported
r = calExtSupported(CAL_EXT_RES_CREATE);
if (r != CAL_RESULT_OK) return false; // too bad it is not!!
// Get pointer to calResCreate extension
r = calExtGetProc(&calResCreate_proc, CAL_EXT_RES_CREATE, "calResCreate2D");
// Now create 2D resource in system memory
(calResCreate2D_proc)(&XMem_Res, device, &p_buffer, 64, 256, CAL_FORMAT_FLOAT_4, size_bytes, 0);
Here I have a question should size_bytes be 64*256*(4*4), ie w*h*sizeof(float4)... In this case why is this parameter needed ?
or is there any consideration for an optimal pitch?
// Then for instance init DMA transfer to this resource from a local GPU resource
r = calMemCopy(&e,context,local_Mem_M,XMem_Mem,0);
// Data should have been tranferred from GPU local memory to sytem memory buffer p_buffer
// Now create 2D resource in system memory CALresource XMem_Res=0; float *p_buffer; (calResCreate2D_proc)(&XMem_Res, device, &p_buffer, 64, 256, CAL_FORMAT_FLOAT_4, size_bytes, 0);
Some problems in the code -
p_buffer has to be allocated before use. Allocation requirements -
1. Number of elements in width should satisfy pitch alignment requirements. It should be integer multiple of CALdeviceattribs.pitch_alignment (64).
2. p_buffer should be mem_aligned with CALdeviceattribs..surface_alignment bytes (256 bytes).
You are right size_bytes is un-necessary. It has to match w*h*sizeof(format).
Regarding DMA - Yes, you should expect p_buffer to be updated with new data.
This clarifies the matter.
I put it aside on my to-do list, and I'll try to use it in the near future.
Have a nice day.
Are there any other requirements? I've found that I get CAL_RESULT_ERROR returned if the height is larger than some size (of which I have not yet determined that is less than 8192, my card's maximum allowable dimension size).
After getting a function pointer blah blah, I call
err1 = calResCreate2D(&this->resource, this->device->dev,
(CALvoid*)buffer, 640, 4096, type, 640 * 4096 * sizeof(float), 0);
This runs returns CAL_RESULT_OK and runs fine, but I really want a 640x8192 matrix. A 4097 height also works, so 4096 isn't the limit. 640 is a multiple of 64 and buffer was allocated to 256 byte alignment with posix_memalign(). What's the issue here? type in this case is FLOAT1.
I don't have the exact number. But, amount of memory available for pinned resource is much lesser than allowed via local or remote resource. Probably, you can try allocating multiple resources of 64*64 and see how many resouce and how much memory you are able to allocate.
I am not very sure myself. No documentation about this anywhere and no samples either. I am trying to do some experiments and once I am clearer about whats happening, I will get back to you.
The amount of memory is limited to either 16MB or 64MB depending on your operating system. This is a limit that the CAL team is working with the driver teams to increase.
Well, since 8192*640*4 = ~20MB, I think it's safe to assume the limit is 16MB on my OS. Thanks for your help! Is this pinned limit based on a single contiguous block of memory, or all the memory you can have allocated at a single point in time (i.e., I can have more allocated so long no single buffer is greater than 16MB)?
Its total available memory that can be used.