cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

makhan
Journeyman III

OpenCL tutorial - incorrect results on GPU

Hi,

 

I just started my adventure with OpenCL. After some time of setting everything up and fixing compilation errors, I managed to get my first program running. It's written according to this tutorial. Code included under the post.

 

When I pass CL_DEVICE_TYPE_CPU as first argument to context, it works just fine. But when I change it to CL_DEVICE_TYPE_GPU, not only just a few values are filled (usually 2-3), but they are often at wrong indexes.

 

My GPU is ATI Radeon HD5850, my OS is Windows 7 Pro x64, I have 10.03 Catalyst drivers and ATI Stream SDK v2.

 

Can you help me solve this problem?

 

 

//lesson1_kernels.cl #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable __constant char hw[] = "Hello World\n"; __kernel void hello(__global char * out) { size_t tid = get_global_id(0); out[tid] = hw[tid]; } //lesson1.cpp #include <utility> #define __NO_STD_VECTOR // Use cl::vector and cl::string and #define __NO_STD_STRING // not STL versions, more on this later #include <malloc.h> #define alloca _alloca #include <cstdio> #include <cstdlib> #include <fstream> #include <iostream> #include <string> #include <iterator> #include <CL/cl.hpp> const std::string hw("Hello World\n"); inline void checkErr(cl_int err, const char * name) { if (err != CL_SUCCESS) { std::cerr << "ERROR: " << name << " (" << err << ")" << std::endl; exit(EXIT_FAILURE); } } int main(void) { cl_int err; cl::vector< cl::Platform > platformList; cl::Platform::get(&platformList); checkErr(platformList.size()!=0 ? CL_SUCCESS : -1, "cl::Platform::get"); std::cout << "Platform number is: " << platformList.size() << std::endl; cl::STRING_CLASS platformVendor; platformList[0].getInfo(CL_PLATFORM_VENDOR, &platformVendor); std::cout << "Platform is by: " << platformVendor.c_str() << "\n"; cl_context_properties cprops[3] = {CL_CONTEXT_PLATFORM, (cl_context_properties)(platformList[0])(), 0}; cl::Context context(CL_DEVICE_TYPE_CPU, cprops,NULL,NULL,&err); checkErr(err, "Conext::Context()"); char * outH = new char[hw.length()+1]; cl::Buffer outCL(context,CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,hw.length()+1,outH,&err); checkErr(err, "Buffer::Buffer()"); cl::vector<cl::Device> devices; devices = context.getInfo<CL_CONTEXT_DEVICES>(); checkErr(devices.size() > 0 ? CL_SUCCESS : -1, "devices.size() > 0"); std::ifstream file("lesson1_kernels.cl"); checkErr(file.is_open() ? CL_SUCCESS:-1, "lesson1_kernel.cl"); std::string prog(std::istreambuf_iterator<char>(file),(std::istreambuf_iterator<char>())); cl::Program::Sources source(1,std::make_pair(prog.c_str(), prog.length()+1)); cl::Program program(context, source); err = program.build(devices,""); checkErr(file.is_open() ? CL_SUCCESS : -1, "Program::build()"); cl::Kernel kernel(program, "hello", &err); checkErr(err, "Kernel::Kernel()"); err = kernel.setArg(0, outCL); checkErr(err, "Kernel::setArg()"); cl::CommandQueue queue(context, devices[0], 0, &err); checkErr(err, "CommandQueue::CommandQueue()"); cl::Event event; err = queue.enqueueNDRangeKernel(kernel, cl::NullRange,cl::NDRange(hw.length()+1),cl::NDRange(1, 1), NULL, &event); checkErr(err, "ComamndQueue::enqueueNDRangeKernel()"); event.wait(); err = queue.enqueueReadBuffer(outCL,CL_TRUE,0,hw.length()+1,outH); checkErr(err, "ComamndQueue::enqueueReadBuffer()"); for (int i=0; i<hw.length()+1; i++) std::cout<< "outH[" << i <<"]= " <<outH<<"\n"; system("pause"); return EXIT_SUCCESS; }

0 Likes
7 Replies
genaganna
Journeyman III

Originally posted by: makhan Hi,

 

 I just started my adventure with OpenCL. After some time of setting everything up and fixing compilation errors, I managed to get my first program running. It's written according to this tutorial. Code included under the post.

 

 When I pass CL_DEVICE_TYPE_CPU as first argument to context, it works just fine. But when I change it to CL_DEVICE_TYPE_GPU, not only just a few values are filled (usually 2-3), but they are often at wrong indexes.

 

 My GPU is ATI Radeon HD5850, my OS is Windows 7 Pro x64, I have 10.03 Catalyst drivers and ATI Stream SDK v2.

 

 Can you help me solve this problem?

 

 

Makhan,

  cl_khr_byte_addressable_store  extension is not supported for GPU's either in SDK2.0 or SDK2.01.  This will be enabled in upcoming release.

 

That is why it fails on GPU.

 

 

0 Likes

Ok, thank you

0 Likes

Had a little break from OpenCL and now I'm back trying to run a simple code on my GPU using OpenCL with C++ bindings.

Tried with two examples so far.
The first one is the one I used before, just changed data type:
http://codeviewer.org/view/code:e04

It works great on CPU, but on GPU it fills exactly one fourth of the buffer (as it is now, with buffer size 16 it fills values 0-3, with 17,18,19 the same, with 20 it will fill 0-4).

The second one is a little bit changed example from C++ bindings documentation. Again, it works great on CPU, but on GPU it doesn't return any values at all.
http://codeviewer.org/view/code:e05

Do you know what's the reason for that?

0 Likes

The first one is the one I used before, just changed data type: http://codeviewer.org/view/code:e04 It works great on CPU, but on GPU it fills exactly one fourth of the buffer (as it is now, with buffer size 16 it fills values 0-3, with 17,18,19 the same, with 20 it will fill 0-4).

Please post the kernel code too.

The second one is a little bit changed example from C++ bindings documentation. Again, it works great on CPU, but on GPU it doesn't return any values at all. http://codeviewer.org/view/code:e05 Do you know what's the reason for that?

 

Works fine for me on both GPU and CPU. Which GPU are you using?

Edit : Just noticed that you have mentioned it above. Are you using SDK 2.0 or 2.01?

0 Likes

Sorry, the kernel is here

http://codeviewer.org/view/code:e08

I'm using ati-stream-sdk-v2.01-vista-win7-64 created on 2010-03-27

 

What I failed to mention is that in the first case, the amount of values filled depends entirely on the queue.enqueueReadBuffer command. If I make a bigger buffer and pass a bigger size value to enqueueReadBuffer while not changing value in enqueueNDRangeKernel, it will compute one fourth of the value passed to enqueueReadBuffer

 

 

this fills buffer of length size: int * outH = new int[size*4]; cl::Buffer outCL(context,CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,size*4,outH,&err); queue.enqueueNDRangeKernel(kernel, cl::NullRange,cl::NDRange(size),cl::NDRange(1), NULL, &event); queue.enqueueReadBuffer(outCL,CL_TRUE,0,size*4,outH); this fills buffer of length [size/4]: int * outH = new int[size]; cl::Buffer outCL(context,CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR,size,outH,&err); queue.enqueueNDRangeKernel(kernel, cl::NullRange,cl::NDRange(size),cl::NDRange(1), NULL, &event); queue.enqueueReadBuffer(outCL,CL_TRUE,0,size,outH);

0 Likes

I just checked on Linux and the behaviour is exactly the same.

Am I supposed to change anything beside CL_DEVICE_TYPE_CPU to run it on GPU? Maybe I got the basics wrong?

 

0 Likes

Originally posted by: makhan I just checked on Linux and the behaviour is exactly the same.

 

Am I supposed to change anything beside CL_DEVICE_TYPE_CPU to run it on GPU? Maybe I got the basics wrong?

 

 



Please upgrade your SDK to SDK2.1 and see you are getting correct values are not.  See following link for New SDK2.1.

http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx

0 Likes