output resource created with calResCreate2D and CAL_FORMAT_UBYTE_1
My goal is basically to mix two RGB images into one grayscale. In order to speed up memory transfers (realtime video processing) I use memory pinning for input and output buffers. It's modified user_memory sample from CAL SDK 1.4.0 beta:
// first initializing CAL and compiling kernel
// then
CALuint isize=sizeof(CALuint)*width*height;
CALuint osize=sizeof(CALubyte)*width*height;
idata0 = (CALuint*)_aligned_malloc(isize, 4096);
idata1 = (CALuint*)_aligned_malloc(isize, 4096);
odata0 = (CALubyte*)_aligned_malloc(osize, 4096);
calExtGetProc((CALextproc*)&calResCreate2D, CAL_EXT_RES_CREATE, "calResCreate2D");
calResCreate2D(&inputRes0, device, (CALvoid*)idata0, width, height, CAL_FORMAT_UINT_1, isize, 0);
calResCreate2D(&inputRes1, device, (CALvoid*)idata1, width, height, CAL_FORMAT_UINT_1, isize, 0);
calResCreate2D(&outputRes0, device, (CALvoid*)odata0, width, height, CAL_FORMAT_UBYTE_1, osize, 0);
// then mapping, binding and initializing input buffers with random data
// and finally calling calCtxRunProgram
I have following problems:
1. The required pitch alignment in data elements is 64. When width is 640, calResCreate2D for odata0 fails, but works for 256, 512, 768 etc. Is UBYTE_1 one data element, or 1/4 of data element?
2. Any kernel fills output buffer with zeroes, even this one:
kernel void SomeKernel(unsigned int i0<>, unsigned int i1<>, out unsigned char o0<>
{
o0 = 0xff;
}
All of this works well for CAL_FORMAT_UINT_1 output buffer.