Hi,
I am a c++ programmer, but still very much learning. My experience with OpenCL is nill and my experience of trying any hello world opencl tutorials isn't great. I can never get them working. I can get the sample code working (from the sdk installs), but i'd like to be able to do something a bit more simple before i can delve into something more complicated.
I've just ordered the book "heterogeneous computing with opencl", and hopefully this will help somewhat. However, what i'd really like, to get started is an idiot's guide to opencl.
For example: let's say that i want to add all the numbers up from 1 to 100. In c++, this is obviously very simple. A straight forward for loop would suffice.
int total = 0;
for (int i = 1; i <= 100; i++)
total += i;
But how could i do this in opencl properly?
Once i can get an understanding of the very basics, then i'll be able to get going.
What header files are needed? What opencl files do we need?
I appreciate your help.
(I'm working on v2.6 sdk, opencl 1.1. I have the firepro v4800 graphics card)
Hello,
what you asked is not the simplest you can start with.
Your loop need the previous iteration so it's not quite parallel.
This kind of problem is called a reduction problem, it can be improved by parallelism/opencl but it's not obvious.
Try first to parallelize this
int A[1000], B[1000], C[1000];
...
for(int i=0;i<1000;i++) C=A+B;
Ah yes, of course. I see what you mean in terms of my original post.
So how would i go about writing the opencl code for this one that you suggested? What i want to do, is within a c++ program / environment, alter the code so that it can be output on to the GPU rather than the CPU. All the tutorials i've had a look at seem to jump a few steps ahead and i need those first few steps!
Here is some code that addresses what you ask for
__kernel void vec_add (__global float *a, __global float *b, _global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
However - where does this code go? In the cpp file? In a cl file? I assume this part of the code would be in the cpp file?
(i read this particular example somewhere else)
Daft question (amongst many others i'll be asking) but are __kernel and __global keywords specific to opencl?
I know there's a lot more code to write to actually execute the calculations. Question is: what? and where do they go?
The parameters i pass in - do they have to be pointers? Could they be passed in as references (given that i was passing in a class which contained an array of floats, say)?
Hi, jazpearson
Your are a c++ programmer,right? So in my opinion, the kernel is just a function.But it's much special,where you must prepare many many things in order to invoke it.For example,
this is your kernel,and you want to invoke it,and as you know, all programs must be running under our control.And how to run and manage your kernel? Platform, context, device, program, command queue and so on,these things are the necessities for running a kernel(you can get these information at anywhere).You must know the steps about running a kernel. And let's talk about the kernel.It's a special function,so it has its own keywords and functions,such as "__kernel","get_global_id()".And I suggest you should look more samples and tutorials.You can get all answers from those opencl tutorials.
Thank you.
There are a few good books available that will cover the openCL basics. The one you mentioned is pretty good, I also would reccomend "openCl in Action". It covers all the basics in a very easy to understand manner. Good luck. Also, most of the time my openCL kernel code goes in .cl files that get compiled at runtime.
Even am new to openCL...but i have tried some code referng to a book and it worked...
i hope it helps you ...in this code i have written the kernel in a text file...we need to store the kernel ans a string and the
openCL functions take this string and compile it to generate the executable code...dumped in to the device context...in this code
// System includes
#include <stdio.h>
#include <stdlib.h>
#include<windows.h>
// OpenCL includes
#include <CL/cl.h>
const char *SourceFile = (//"vectormul.txt";-----do not include this if you are not saving kernel in a text file... just write the string as below)
/*"_kernel \n"
"void vecadd(__global int*A, \n"
" __global int*B, \n"
" __global int*C) \n"
"{ \n"
"int idx=get_global_id(0); \n"
" printf("%d ",A[idx]); \n"
" cout<<endl; \n"
" cout<<" "<<B[idx]<<endl; \n"
"C[idx]=A[idx]+B[idx]; \n"
"} \n"
;*/
// Project includes
// Constants, globals
const int ELEMENTS = 10; // elements in each vector
// Signatures
char* readSource(const char *sourceFilename);
int main(int argc, char ** argv)
{
printf("Running Vector Addition program\n\n");
size_t datasize = sizeof(int)*ELEMENTS;
int *A, *B; // Input arrays
float *C; // Output array
// Allocate space for input/output data
A = (int*)malloc(datasize);
B = (int*)malloc(datasize);
C = (float*)malloc(datasize);
if(A == NULL || B == NULL || C == NULL) {
perror("malloc");
exit(-1);
}
// Initialize the input data
for(int i = 0; i < ELEMENTS; i++) {
A = i;
B = i;
}
for(int i = 0; i < ELEMENTS; i++) {
printf("%d ",A);
}
printf("\n");
for(int i = 0; i < ELEMENTS; i++) {
printf("%d ",B);
}
printf("\n");
cl_int status; // use as return value for most OpenCL functions
cl_uint numPlatforms = 0;
cl_platform_id *platforms;
/////////////////////////////////////////////
// STEP 1: Discover and initialize platforms
/////////////////////////////////////////////
// Query for the number of recongnized platforms
status = clGetPlatformIDs(0, NULL, &numPlatforms);
if(status != CL_SUCCESS) {
printf("clGetPlatformIDs failed\n");
exit(-1);
}
// Make sure some platforms were found
if(numPlatforms == 0) {
printf("No platforms detected.\n");
//exit(-1);
}
// Allocate enough space for each platform
platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));
if(platforms == NULL) {
perror("malloc");
//exit(-1);
}
// Fill in platforms
clGetPlatformIDs(numPlatforms, platforms, NULL);
if(status != CL_SUCCESS) {
printf("clGetPlatformIDs failed\n");
//exit(-1);
}
// Print out some basic information about each platform
printf("%u platforms detected\n", numPlatforms);
for(unsigned int i = 0; i < numPlatforms; i++) {
char buf[100];
printf("Platform %u: \n", i);
status = clGetPlatformInfo(platforms, CL_PLATFORM_VENDOR,
sizeof(buf), buf, NULL);
printf("\tVendor: %s\n", buf);
status |= clGetPlatformInfo(platforms, CL_PLATFORM_NAME,
sizeof(buf), buf, NULL);
printf("\tName: %s\n", buf);
if(status != CL_SUCCESS) {
printf("clGetPlatformInfo failed\n");
//exit(-1);
}
}
printf("\n");
/////////////////////////////////////////////
// STEP 2: Discover and initialize devices
/////////////////////////////////////////////
cl_uint numDevices = 0;
cl_device_id *devices;
// Retrieve the number of devices present
status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, 0, NULL,
&numDevices);
if(status != CL_SUCCESS) {
printf("clGetDeviceIDs failed\n");
//exit(-1);
}
// Make sure some devices were found
if(numDevices == 0) {
printf("No devices detected.\n");
//exit(-1);
}
// Allocate enough space for each device
devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));
if(devices == NULL) {
perror("malloc");
//exit(-1);
}
// Fill in devices
status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, numDevices,
devices, NULL);
if(status != CL_SUCCESS) {
printf("clGetDeviceIDs failed\n");
//exit(-1);
}
// Print out some basic information about each device
printf("%u devices detected\n", numDevices);
for(unsigned int i = 0; i < numDevices; i++) {
char buf[100];
printf("Device %u: \n", i);
status = clGetDeviceInfo(devices, CL_DEVICE_VENDOR,
sizeof(buf), buf, NULL);
printf("\tDevice: %s\n", buf);
status |= clGetDeviceInfo(devices, CL_DEVICE_NAME,
sizeof(buf), buf, NULL);
printf("\tName: %s\n", buf);
if(status != CL_SUCCESS) {
printf("clGetDeviceInfo failed\n");
//exit(-1);
}
}
printf("\n");
// START Execution Model
/////////////////////////////////////////////
// STEP 3: Create a Context
/////////////////////////////////////////////
cl_context context;
// Create a context and associate it with the devices
context = clCreateContext(NULL, numDevices, devices, NULL, NULL, &status);
if(status != CL_SUCCESS || context == NULL) {
printf("clCreateContext failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 4: Create a Command Queue
/////////////////////////////////////////////
cl_command_queue cmdQueue;
// Create a command queue and associate it with the device you
// want to execute on
cmdQueue = clCreateCommandQueue(context, devices[0], 0, &status);
if(status != CL_SUCCESS || cmdQueue == NULL) {
printf("clCreateCommandQueue failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 5: Create Device buffers
/////////////////////////////////////////////
cl_mem d_A, d_B; // Input buffers on device
cl_mem d_C; // Output buffer on device
// Create a buffer object (d_A) that contains the data from the host ptr A
d_A = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
datasize, A, &status);
if(status != CL_SUCCESS || d_A == NULL) {
printf("clCreateBuffer failed\n");
//exit(-1);
}
// Create a buffer object (d_B) that contains the data from the host ptr B
d_B = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
datasize, B, &status);
if(status != CL_SUCCESS || d_B == NULL) {
printf("clCreateBuffer failed\n");
//exit(-1);
}
// Create a buffer object (d_C) with enough space to hold the output data
d_C = clCreateBuffer(context, CL_MEM_READ_WRITE,
datasize, NULL, &status);
if(status != CL_SUCCESS || d_C == NULL) {
printf("clCreateBuffer failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 6: Create and compile the program
/////////////////////////////////////////////
cl_program program;
char *source;
printf("start reading source file!\n");
//const char *sourceFile = "vectoradd.cl";
// This function reads in the source code of the program
source = readSource(SourceFile);//File);
printf("done! reading source file. \n");
//printf("Program source is:\n%s\n", source);
// Create a program. The 'source' string is the code from the
// vectoradd.cl file.
program = clCreateProgramWithSource(context, 1, (const char**)&source,//source,
NULL, &status);
if(status != CL_SUCCESS) {
printf("clCreateProgramWithSource failed\n");
//exit(-1);
}
printf("done! creating programe source file. \n");
cl_int buildErr;
// Build (compile & link) the program for the devices.
// Save the return value in 'buildErr' (the following
// code will print any compilation errors to the screen)
buildErr = clBuildProgram(program, numDevices, devices, NULL, NULL, NULL);
printf("done! building source file. \n");
// If there are build errors, print them to the screen
if(buildErr != CL_SUCCESS) {
printf("Program failed to build.\n");
cl_build_status buildStatus;
for(unsigned int i = 0; i < numDevices; i++) {
clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_STATUS,
sizeof(cl_build_status), &buildStatus, NULL);
if(buildStatus == CL_SUCCESS) {
continue;
}
char *buildLog;
size_t buildLogSize;
clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG,
0, NULL, &buildLogSize);
buildLog = (char*)malloc(buildLogSize);
if(buildLog == NULL) {
perror("malloc");
//exit(-1);
}
clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG,
buildLogSize, buildLog, NULL);
buildLog[buildLogSize-1] = '\0';
printf("Device %u Build Log:\n%s\n", i, buildLog);
free(buildLog);
}
exit(0);
}
else {
printf("No build errors\n");
}
/////////////////////////////////////////////
// STEP 7: Create the kernel
/////////////////////////////////////////////
cl_kernel kernel;
// Create a kernel from the vector addition function (named "vecadd")
kernel = clCreateKernel(program, "vecadd", &status);
if(status != CL_SUCCESS) {
printf("clCreateKernel failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 8: Set the kernel arguments
/////////////////////////////////////////////
// Associate the input and output buffers with the kernel
status = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_A);
status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_B);
status |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_C);
if(status != CL_SUCCESS) {
printf("clSetKernelArg failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 9: Configure the work-item structure
/////////////////////////////////////////////
// Define an index space (global work size) of threads for execution.
// A workgroup size (local work size) is not required, but can be used.
size_t globalWorkSize[1]; // There are ELEMENTS threads
globalWorkSize[0] = ELEMENTS;
/////////////////////////////////////////////
// STEP 10: Enqueue the kernel for execution
/////////////////////////////////////////////
// Execute the kernel.
// 'globalWorkSize' is the 1D dimension of the work-items
status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, globalWorkSize,
NULL, 0, NULL, NULL);
if(status != CL_SUCCESS) {
printf("clEnqueueNDRangeKernel failed\n");
//exit(-1);
}
/////////////////////////////////////////////
// STEP 11: Read the output buffer back to the host
/////////////////////////////////////////////
// Read the OpenCL output buffer (d_C) to the host output array (C)
clEnqueueReadBuffer(cmdQueue, d_C, CL_TRUE, 0, datasize, C,
0, NULL, NULL);
// Verify correctness
bool result = true;
for(int i = 0; i < ELEMENTS; i++)
{
printf("%f ",C);
if(C != i*i) {
result = false;
break;
}
}
printf("\n");
if(result) {
printf("Output is correct\n");
}
else {
printf("Output is incorrect\n");
}
/////////////////////////////////////////////
// STEP 12: Release OpenCL resources
/////////////////////////////////////////////
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseCommandQueue(cmdQueue);
clReleaseMemObject(d_A);
clReleaseMemObject(d_B);
clReleaseMemObject(d_C);
clReleaseContext(context);
free(A);
free(B);
free(C);
free(source);
free(platforms);
free(devices);
getchar();
return 0;
}
char* readSource(const char *sourceFilename) {
FILE *fp;
int err;
int size;
char *source;
fp = fopen(sourceFilename, "rb");
if(fp == NULL) {
printf("Could not open kernel file: %s\n", sourceFilename);
//exit(-1);
}
err = fseek(fp, 0, SEEK_END);
if(err != 0) {
printf("Error seeking to end of file\n");
//exit(-1);
}
size = ftell(fp);
if(size < 0) {
printf("Error getting file position\n");
// exit(-1);
}
err = fseek(fp, 0, SEEK_SET);
if(err != 0) {
printf("Error seeking to start of file\n");
//exit(-1);
}
source = (char*)malloc(size+1);
if(source == NULL) {
printf("Error allocating %d bytes for the program source\n", size+1);
//exit(-1);
}
err = fread(source, 1, size, fp);
if(err != size) {
printf("only read %d bytes\n", err);
// exit(0);
}
source[size] = '\0';
return source;
}
Thanks very much for all the help. I've actually got a working implementation now. Now i just need to try and understand it
and see how i can implement this into any existing code.
Thanks again.
You should check out PyOpenCL. It's a great way to get introduced to OpenCL without getting bogged down with gory details at first.
--Keith Brafford
The OpenCL Spec and Reference Card are worth having by your side: