I have integrated some OpenCL in Unreal Engine for our game production via AMD APP 2.8.1 and I am trying to avoid compiling the OpenCL code every time as this takes a very long time.
So I have a working path that uses clCreateProgramWithSource and then I save it on the HDD with:
result = clGetProgramInfo(*program, CL_PROGRAM_NUM_DEVICES, sizeof(number_of_binaries), &number_of_binaries, NULL);
binary_sizes = new size_t[number_of_binaries];
binary = new uint8*[number_of_binaries];
result = clGetProgramInfo(*program, CL_PROGRAM_BINARY_SIZES, number_of_binaries*sizeof(size_t), binary_sizes, NULL);
check(number_of_binaries==1); // We ask for only one GPU device, so we should get one binary
for (int i = 0; i < number_of_binaries; ++i) binary[i] = new uint8[binary_sizes[i]];
result = clGetProgramInfo(*program, CL_PROGRAM_BINARIES, number_of_binaries*sizeof(uint8*), binary, NULL);
const FString BC7CompressorFilename = FPaths::Combine(*FPaths::EngineDir(), TEXT("Source\\Developer\\TextureFormatBC67"),
FArchive* Ar = GFileManager->CreateFileWriter( *BC7CompressorFilename, 0, binary_sizes );
if( !Ar )
GWarn->Logf(ELogVerbosity::Error, TEXT("Cannot create file for saving OpenCL binary."));
Ar->Serialize( const_cast<uint8*>(binary), binary_sizes );
And then, on a separate later session, I load the program:
if (FFileHelper::LoadFileToArray( program_binary, *BC7CompressorFilename, FILEREAD_Silent ))
const uint8* binary = (uint8*)program_binary.GetData();
size_t binary_size = (size_t) program_binary.Num();
GWarn->Logf(ELogVerbosity::Log, *(FString(TEXT("Loading ")) + BC7CompressorFilename + FString(TEXT(" (")) + FString::FromInt(binary_size) + FString(TEXT(" bytes)"))));
cl_program returned_program = clCreateProgramWithBinary(_context, 1, &_device,
&binary_size, &binary, &binary_status,
if (result != CL_SUCCESS)
GWarn->Logf(ELogVerbosity::Error, TEXT("Failed to create BC7 program!"));
I get no error through all these.
Then I will call
kernel = clCreateKernel(_compress_32bits, "bc7_kernel", &result);
And it crashes there, with the end of the callstack being
|[Frames below may be incorrect and/or missing, no symbols loaded for amdocl64.dll]|
I suppose something is wrong in cl_program, but as the type is opaque, I can't get information from it.
This is running on the same computer, but on separate sessions (so intializations -platform, device, context- are done twice). But as long as device and drivers are the same, that should not make a difference right?
Thanks for reading!