Archives Discussions

sajis997 · ‎03-05-2013

Hi forum,

I am trying to port an existing cuda application to OpenCL and i am not getting the corresponding OpenCL function of the following CUDA function:

cudaError_t cudaMallocPitch	(	void **	devPtr,
		size_t *	pitch,
		size_t	width,
		size_t	height
	)

Allocates at least width (in bytes) * height bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. The pitch returned in *pitch by cudaMallocPitch() is the width in bytes of the allocation. The intended usage of pitch is as a separate parameter of the allocation, used to compute addresses within the 2D array. Given the row and column of an array element of type T, the address is computed as:

    T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;

For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cudaMallocPitch(). Due to pitch alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).

Parameters:

	devPtr	- Pointer to allocated pitched device memory
	pitch	- Pitch for allocation
	width	- Requested pitched allocation width (in bytes)
	height	- Requested pitched allocation height

It will be very nice to get some hint to derive the OpenCL version .

Thanks

Sajjad

realhet · ‎03-05-2013

Hi,

You can use clCreateBuffer() if you're going to allocate 1D memory.

For 2D memory there's a more complicated one: clCreateImage2D() (or even 3D). With this you can specify element format too (for example xyzw/bytes).

Don't forget to use the CL_MEM_READ_WRITEF flag!

himanshu_gautam · ‎03-05-2013

clCreateImage() is the latest API. It can handle 1D, 2D and 3D images.

cl...2D() is deprecated from OpenCL 1.2.

But NVIDIA seems no plan of moving to OpenCl 1.2

So, you may actually be safe using clCreateImage2D() API - if you are looking at NVIDIA platform.

sajis997 · ‎03-05-2013

Hi

I am not sure if i had explained the issue properly.

I think i need something that basically represents a 2D array in the linear manner. So technically, i do need to use the clCreateBuffer(....). All i am confused is how to calculate the pitch value as mentioned in the cudaMallocPitch(...)

Any more idea folks?

Regards

Sajjad

nou · ‎03-05-2013

AFAIK there is no such API.

himanshu_gautam · ‎03-05-2013

Hi,

I also agree with nou. Also it looks like a nice feature to give the developers.

But IMHO you can add some padding manually while calculating the buffer size. Generally power of two sizes are preferable, so that wavefronts do not diverge while reading/writing. But power of two may not be practical always, so i suggest you to try out the next multiple of 16 or 64 as the actual size. It would be nice if you can share your results.

realhet · ‎03-06-2013

For this purpose: "represents a 2D array in the linear manner"

I'd use simple linear memory allocated with clCreateBuffer().

And inside the kernel it can be accessed as simple as: buffer[mad24(y,width,x)]

You should use a pitch of your own choice if there would be too many bank conflicts based on your program's memory access pattern.

If you use Image it will utilize the hardware texture samplers (with a possibility for mip mapping, aniso filtering, converting to 0.0..1.0 range), but for this simple 2D linear thing I think it is too much.

sajis997 · ‎03-10-2013

Hi forum,

Thanks for all the hints over this issue.

I am creating a linear array on the device and another texture buffer on device for faster texture fetch. The linear array is created as follows:

typedef cl_float2 cData;

cData* hvfield = NULL;

......................................

//allocate the opencl object for the source data

hvfield = (cData*)malloc(sizeof(cData) * DS);

memset(hvfield,0,sizeof(cData) * DS);

dvfield = clCreateBuffer(cxGPUContext,CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,sizeof(cData) * DS,hvfield, &ciErrNum);

oclCheckErrorEX(ciErrNum, CL_SUCCESS, pCleanup);

Now while creating the texture object i am confused about the cl_image_format

cl_image_format image_format;

image_format.image_channel_order = CL_RGBA;

image_format.image_channel_data_type = CL_UNSIGNED_INT8;

I am not sure about the second assignment of the immediately above snippet. The image texture is basically pointing to the linear buffer i have created first and the data type is cData there.

What should be the data type here ? Since i typdef it like typedef cl_float2 cData, what should be assignment for

image_format.image_channel_data_type = ????

Regards

Sajjad

himanshu_gautam · ‎03-11-2013

   cl_image_format image_format;
   image_format.image_channel_order = CL_RGBA;
   image_format.image_channel_data_type = CL_UNSIGNED_INT8;
I am not sure about the second assignment of the immediately above snippet. The image texture is basically pointing to the linear buffer i have created first and the data type is cData there.
What should be the data type here ? Since i typdef it like typedef cl_float2 cData, what should be assignment for
   image_format.image_channel_data_type = ????
Regards
Sajjad

As i understand you want to use a bufer of cl_float2 type as a cl_image object. Here is what i find in spec for that.

Image Channel Data Type can be CL_FLOAT type and image channel order can be CL_RG (From 5.6 & 5.7 of OpenCL spec 1.2). You could also use channel order as CL_R, but with double the indexes in the cl_image object.

Hope this helps.

Archives Discussions

opencl porting