cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

parubok
Journeyman III

Unable to build AMD APP SDK samples on Win7 x64 - make error

My system: Windows 7 Pro x64

GPU: AMD HD 6870

MinGW-x64: mingw-w64-bin_i686-mingw_20110827.zip

C:\Users\eli\Documents\AMD APP\samples>make.exe -version
GNU Make 3.82
Built for x86_64-w64-mingw32

I follow the instructions in "Building With MinGW-x64 + GCC Using make Files" and get this error:

C:\Users\eli\Documents\AMD APP\samples>make.exe bitness=64
process_begin: CreateProcess(NULL, uname -a, ...) failed.
../make/openclsdkdefs.mk:28: *** Unknown OS:.  Stop.

What is the problem? Thanks.

0 Likes
9 Replies
szabi_h
Journeyman III

(I have Win7 prof. 64 bit, VS2010 prof, firepro v4800)
I use this: http://blog.cuvilib.com/2011/07/01/how-to-run-opencl-in-microsoft-vs-2008-using-amd-app-sdk/ (but with amdapp* : not atistream*)
It's worked with this code:
#include "stdio.h"
#include "stdlib.h"
#include "CL/cl.h"
int main(int argc, char **argv){

    printf(“jj”);
    return 0;
}

but when i add this to the project:
__kernel void vectorAdd(__global const float * a, __global const float * b, __global float * c) {
    int nIndex = get_global_id(0);
    c[nIndex] = a[nIndex] + b[nIndex];
}

The VS 2010 write this:
IntelliSense: this declaration has no storage class or type specifier.
After that i try to build, the VS write this:
error C2144: syntax error: ‘void’ should be preceded by ‘;’

0 Likes

Originally posted by: szabi_h

(I have Win7 prof. 64 bit, VS2010 prof, firepro v4800) I use this: http://blog.cuvilib.com/2011/07/01/how-to-run-opencl-in-microsoft-vs-2008-using-amd-app-sdk/ (but with amdapp* : not atistream*) It's worked with this code: #include "stdio.h" #include "stdlib.h" #include "CL/cl.h" int main(int argc, char **argv){
    printf(“jj”);     return 0; }

but when i add this to the project: __kernel void vectorAdd(__global const float * a, __global const float * b, __global float * c) {     int nIndex = get_global_id(0);     c[nIndex] = a[nIndex] + b[nIndex]; }

The VS 2010 write this: IntelliSense: this declaration has no storage class or type specifier. After that i try to build, the VS write this: error C2144: syntax error: ‘void’ should be preceded by ‘;’

It looks like you have added following code to .cpp file.  C++ compiler won't understand OpenCL.

Please install SDK2.5 and see how it is done.

0 Likes

Thank you for the answer.

Unfortunetly: previously i installed the SDK2.5, and my code extension is not .cpp, it is .cl -File Type: document in properties window- (or there is something that i don't understand with this). At the Amd is there something like this: http://sourceforge.net/projects/cudavswizard/ (CUDA_VS_Wizard) ?
I would like to use kernel and main function in same file, without sampletoolkit. Is there any possibility?

0 Likes

What can i do, that the vs2010 compile my .cl file?

0 Likes

The .cl file compiles on-the-fly when you execute the otherwise compiled program.

In other words, you do not compile the .cl file yourself, but your program does when you call clBuildProgramFromSource (there are other methods, but this is the most common one)

0 Likes

Originally posted by: szabi_h What can i do, that the vs2010 compile my .cl file?

you have to use clBuildProgram API to compile OpenCL code. You can see any SDK sample to know how to do.

 

 

0 Likes

Thank you for all answers. I was developed Cuda, but now learning the OpenCl, and i find that in the "Nvidia_OpenCl_JumpStart_Guide.pdf" at "the difference between Cuda and OpenCl":
"Using C for CUDA, kernel programs are precompiled into a binary format"(using the NVCC compiler) "and there are function calls for dealing with module and function loading. In OpenCL, the compiler is built into the runtime and can be invoked on the raw text or a binary can be built and saved for later load. "
I write it to all others, who learned Cuda and who would like to learn OpenCl (and who see the Amd website, because his/her new videocard is a Radeon).
But this OpenCl method looks like slower as the Cuda's method.

0 Likes

szabi_h

You can use kernel and main function in same file in this way (Code is taken from the forum Nvidia)

Less of this method is that you can not edit the kernel without recompiling the program

// System includes #include <stdio.h> #include <stdlib.h> #include <iostream> using namespace std; // OpenCL includes #include <CL/cl.h> // Constants, globals const int ELEMENTS = 2048; // elements in each vector ////////////////////// Simple compute kernel which computes the square of an input array const char *Source = "\n" \ "__kernel void vecadd(__global int *A, __global int *B,__global int *C) \n" \ "{ \n" \ " int idx = get_global_id(0); \n" \ " C[idx] = A[idx] * B[idx]; \n" \ "} \n" \ "\n"; //////////////////////////////////////// int main(int argc, char ** argv) { printf("Running Vector Addition program\n\n"); size_t datasize = sizeof(int)*ELEMENTS; int *A, *B; // Input arrays int *C; // Output array // Allocate space for input/output data A = (int*)malloc(datasize); B = (int*)malloc(datasize); C = (int*)malloc(datasize); if(A == NULL || B == NULL || C == NULL) { perror("malloc"); exit(-1); } // Initialize the input data for(int i = 0; i < ELEMENTS; i++) { A = i; B = i; } cl_int status; // use as return value for most OpenCL functions cl_uint numPlatforms = 0; cl_platform_id *platforms; // Query for the number of recongnized platforms status = clGetPlatformIDs(0, NULL, &numPlatforms); if(status != CL_SUCCESS) { printf("clGetPlatformIDs failed\n"); exit(-1); } // Make sure some platforms were found if(numPlatforms == 0) { printf("No platforms detected.\n"); exit(-1); } // Allocate enough space for each platform platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id)); if(platforms == NULL) { perror("malloc"); exit(-1); } // Fill in platforms clGetPlatformIDs(numPlatforms, platforms, NULL); if(status != CL_SUCCESS) { printf("clGetPlatformIDs failed\n"); exit(-1); } // Print out some basic information about each platform printf("%u platforms detected\n", numPlatforms); for(unsigned int i = 0; i < numPlatforms; i++) { char buf[100]; printf("Platform %u: \n", i); status = clGetPlatformInfo(platforms, CL_PLATFORM_VENDOR, sizeof(buf), buf, NULL); printf("\tVendor: %s\n", buf); status |= clGetPlatformInfo(platforms, CL_PLATFORM_NAME, sizeof(buf), buf, NULL); printf("\tName: %s\n", buf); if(status != CL_SUCCESS) { printf("clGetPlatformInfo failed\n"); exit(-1); } } printf("\n"); cl_uint numDevices = 0; cl_device_id *devices; // Retrive the number of devices present status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices); if(status != CL_SUCCESS) { printf("clGetDeviceIDs failed\n"); exit(-1); } // Make sure some devices were found if(numDevices == 0) { printf("No devices detected.\n"); exit(-1); } // Allocate enough space for each device devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id)); if(devices == NULL) { perror("malloc"); exit(-1); } // Fill in devices status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, numDevices, devices, NULL); if(status != CL_SUCCESS) { printf("clGetDeviceIDs failed\n"); exit(-1); } // Print out some basic information about each device printf("%u devices detected\n", numDevices); for(unsigned int i = 0; i < numDevices; i++) { char buf[100]; printf("Device %u: \n", i); status = clGetDeviceInfo(devices, CL_DEVICE_VENDOR, sizeof(buf), buf, NULL); printf("\tDevice: %s\n", buf); status |= clGetDeviceInfo(devices, CL_DEVICE_NAME, sizeof(buf), buf, NULL); printf("\tName: %s\n", buf); if(status != CL_SUCCESS) { printf("clGetDeviceInfo failed\n"); exit(-1); } } printf("\n"); cl_context context; // Create a context and associate it with the devices context = clCreateContext(NULL, numDevices, devices, NULL, NULL, &status); if(status != CL_SUCCESS || context == NULL) { printf("clCreateContext failed\n"); exit(-1); } cl_command_queue cmdQueue; // Create a command queue and associate it with the device you // want to execute on cmdQueue = clCreateCommandQueue(context, devices[0], 0, &status); if(status != CL_SUCCESS || cmdQueue == NULL) { printf("clCreateCommandQueue failed\n"); exit(-1); } cl_mem d_A, d_B; // Input buffers on device cl_mem d_C; // Output buffer on device // Create a buffer object (d_A) that contains the data from the host ptr A d_A = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR, datasize, A, &status); if(status != CL_SUCCESS || d_A == NULL) { printf("clCreateBuffer failed\n"); exit(-1); } // Create a buffer object (d_B) that contains the data from the host ptr B d_B = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR, datasize, B, &status); if(status != CL_SUCCESS || d_B == NULL) { printf("clCreateBuffer failed\n"); exit(-1); } // Create a buffer object (d_C) with enough space to hold the output data d_C = clCreateBuffer(context, CL_MEM_READ_WRITE, datasize, NULL, &status); if(status != CL_SUCCESS || d_C == NULL) { printf("clCreateBuffer failed\n"); exit(-1); } cl_program program; printf("Program source is:\n%s\n", Source); // Create a program. The 'source' string is the code from the // vectoradd.cl file. program = clCreateProgramWithSource(context, 1, (const char**)& Source, NULL, &status); if(status != CL_SUCCESS) { printf("clCreateProgramWithSource failed\n"); exit(-1); } cl_int buildErr; // Build (compile & link) the program for the devices. // Save the return value in 'buildErr' (the following // code will print any compilation errors to the screen) buildErr = clBuildProgram(program, numDevices, devices, NULL, NULL, NULL); // If there are build errors, print them to the screen if(buildErr != CL_SUCCESS) { printf("Program failed to build.\n"); cl_build_status buildStatus; for(unsigned int i = 0; i < numDevices; i++) { clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_STATUS, sizeof(cl_build_status), &buildStatus, NULL); if(buildStatus == CL_SUCCESS) { continue; } char *buildLog; size_t buildLogSize; clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG, 0, NULL, &buildLogSize); buildLog = (char*)malloc(buildLogSize); if(buildLog == NULL) { perror("malloc"); exit(-1); } clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG, buildLogSize, buildLog, NULL); buildLog[buildLogSize-1] = '\0'; printf("Device %u Build Log:\n%s\n", i, buildLog); free(buildLog); } exit(0); } else { printf("No build errors\n"); } cl_kernel kernel; // Create a kernel from the vector addition function (named "vecadd") kernel = clCreateKernel(program, "vecadd", &status); if(status != CL_SUCCESS) { printf("clCreateKernel failed\n"); exit(-1); } // Associate the input and output buffers with the kernel status = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_A); status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_B); status |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_C); if(status != CL_SUCCESS) { printf("clSetKernelArg failed\n"); exit(-1); } // Define an index space (global work size) of threads for execution. // A workgroup size (local work size) is not required, but can be used. size_t globalWorkSize[1]; // There are ELEMENTS threads globalWorkSize[0] = ELEMENTS; // Execute the kernel. // 'globalWorkSize' is the 1D dimension of the work-items status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, globalWorkSize, NULL, 0, NULL, NULL); if(status != CL_SUCCESS) { printf("clEnqueueNDRangeKernel failed\n"); exit(-1); } // Read the OpenCL output buffer (d_C) to the host output array (C) clEnqueueReadBuffer(cmdQueue, d_C, CL_TRUE, 0, datasize, C, 0, NULL, NULL); // Verify correctness bool result = true; for(int i = 0; i < ELEMENTS; i++) { if(C != i*i) { result = false; break; } } if(result) { printf("Output is correct\n"); } else { printf("Output is incorrect\n"); } clReleaseKernel(kernel); clReleaseProgram(program); clReleaseCommandQueue(cmdQueue); clReleaseMemObject(d_A); clReleaseMemObject(d_B); clReleaseMemObject(d_C); clReleaseContext(context); free(A); free(B); free(C); free(platforms); free(devices); }

0 Likes

Thank You! I exactly now searched this code (or similar).

0 Likes