cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

himanshu_gautam
Grandmaster

Re: How to implement cl_khr_icd?

Few Comments:

1. I do not see the need to launch 1024 work-items for reducing 1024 elements. And then using conditions inside kernel, which disables half the thread directly. Why not launch 512 threads only.

2. use get_global_id(0). The group_id method may be right, but is very confusing (with that 2 inside it).

Just rewriting the small section of kernel.

Global Size:512, Local Size=64

int gid = get_global_id(0);

int lid = get_local_id(0);

int grp_id = get_group_id(0);

int grp_size = get_group_size(0);

if(gid < 512)

{

// 3 versions for varying access pattern. Just check once before using, not tested

     //ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[gid])))  +  log(exp(sqrt(a_g_idata[gid + get_global_size(0)]))) ; 

//ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[2 * gid])))  +  log(exp(sqrt(a_g_idata[2 * gid + 1]))) ; 

//ocl_test_sdata[lid] = log(exp(sqrt(a_g_idata[(2 * grp_id) * grp_size + lid])))  + 

                         log(exp(sqrt(a_g_idata[(2 * grp_id + 1) * grp_size + lid]))) ; 

}

0 Likes
ash
Journeyman III

Re: How to implement cl_khr_icd?

Hi,

Thanks for the comments I'll try that, I think I mixed the local parameter that we pass to the enqueueNDRangeKernel function and the total number of elements that should be computed. I thought it was the same but from what you told it's not really the same.

Another question, were you able to test my code on an AMD GPU to see of the test passed even if you disable printf?

I'd be reassured if my code run on NVIDIA and AMD GPU correctly.

Also, could you please tell me how to post code as a zipped attachment?

Have a nice day.

Best regards,

ash

0 Likes
himanshu_gautam
Grandmaster

Re: How to implement cl_khr_icd?


Your code returns SUCCESS with/witout using printf. Here is the output when debug was disabled.

C:\Users\cas\Desktop\reduce>host.exe
Platform name found AMD Accelerated Parallel Processing
--> Choosen Device name: Capeverde
959.575
1762.89
2284.09
2705.42
3069.08
3393.84
3690.06
3964.17
parallel sum 21829.1
SUCCESS!

0 Likes
ash
Journeyman III

Re: How to implement cl_khr_icd?

Good to know, thanks a lot!

Then maybe the problem was from the AMD GPU I got. I'll try to test on another one if possible later.

I'm now porting a CUDA application to OpenCL and I encountered some problems. I don't know if you're familiar with Cuda, I'm facing some diffculties to "translate" tex3D and textures in OpenCL. I read about cl::Image so I think that I choose use that to pass data to the kernel but it's not very clear.

0 Likes
himanshu_gautam
Grandmaster

Re: How to implement cl_khr_icd?

You are right. Look into cl::image, you can checkout some APP SDK Samples (although most of them have been written without OpenCL C++ wrapper). SimpleImage, MatrixMulImage are a few to name.

0 Likes
ash
Journeyman III

Re: How to implement cl_khr_icd?

Hi,

I have a small question about cl::Image3D. When you enqueueWriteImage it asks for an origin and a region.

If I want to read the whole image, then the region should be defined as (width,height,depth), isn't it?

0 Likes
himanshu_gautam
Grandmaster

Re: How to implement cl_khr_icd?

region defines the (width, height, depth) in pixels of the 2D or 3D rectangle being read or written. If image is a 2D image object, the depth value given by region[2] must be 1.

From the khronos C++ wrapper document.

0 Likes
ash
Journeyman III

Re: How to implement cl_khr_icd?

Ok then it should be fine, sorry for the bother.

I have (again) another question : in the CUDA code that I'm porting there is a CudaPitchPtr. I read the specs and when you create a 3D image, it's said that you can pass the row_pitch which should be the equivalent of the  host_ptr.pitch.

but What about the xSize and ySize seems like slice_pitch but not too sure. Also I really don't know what to give as a host_ptr when I construct the 3d Image. I think I should allocate an array for the size of the image which means 3 dimensions but seems like in cuda they allocate for a 3D array dimension. I hope you could help I'm kind of lost.

Cuda :

    cudaPitchedPtr h_ptr;

        h_ptr.pitch = volume_size.width*sizeof(float);

        h_ptr.xsize = volume_size.width;

        h_ptr.ysize = volume_size.height;

OpenCL :

cl::Image3D(context, CL_MEM_READ_ONLY, fmt, width, height, depth,

          row_pitch, // = row_pitch = height*sizeof(float);

          slice_pitch, //?

          host_ptr); //?

    

I hope I'm not too far but some help would be pretty well welcomed.

best regards,

ash

0 Likes
nou
Exemplar

Re: How to implement cl_khr_icd?

you can pass pitch parameters as 0 then OpenCL will compute proper value automatically as row_pitch = width*sizeof(pixel type) and slice_pitch=height*row_pitch

0 Likes
ash
Journeyman III

Re: How to implement cl_khr_icd?

Then I "only" have to allocate memory for the host pointer?

So If I have a 3D image I have to allocate memory for a 3D Array? Sorry if my question is dumb but i haven't really understood yet.

0 Likes