Hello,
I've got an ATI RV770 and following these instructions:
http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=115
leads to
terminate called after throwing an instance of 'std:: out_of_range'
what(): basic_string::substr
Aborted
Unstripped kernel works, machine code is: 0x3ee (GPU), 0x7d2 (CPU)
Stripped kernel for CPU works too.
*edit*
If I remove the '-R .amdil' it works so I guess this section is actually importat?!!
The other sections (.source, .llvmir, .amdil) are not used during normal application execution and it is recommended that you remove them before distributing your application.
Originally posted by: Tasp Hello,
I've got an ATI RV770 and following these instructions:
http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=115
leads to
terminate called after throwing an instance of 'std:: out_of_range' what(): basic_string::substr Aborted
Unstripped kernel works, machine code is: 0x3ee (GPU), 0x7d2 (CPU)
Stripped kernel for CPU works too.
*edit*
If I remove the '-R .amdil' it works so I guess this section is actually importat?!!
The other sections (.source, .llvmir, .amdil) are not used during normal application execution and it is recommended that you remove them before distributing your application.
When are you getting that error? Is it solved? if not please give exact steps to reproduce the problem.
More details:
ATI Stream SDK 2.2 with current Catalyst driver
HD4850 (RV770)
C++ bindings
I generate binary kernels as explained in the article. Only difference is I don't use CL_CONTEXT_OFFLINE_DEVICES_AMD because I only want the kernel for my device.
Then I strip the binary kernel:
objcopy -I elf32-i386 -O elf32-i386 -R .source -R .llvmir -R .amdil --alt-machine-code= kernel.bin stripped_kernel.bin
The kernel I used for testing is an empty kernel (I simplified the kernel when looking for the error)
__kernel foo() { }
GPU:
When I call enqueueNDRangeKernel I get the exception: 'std:: out_of_range'
If I remove the '-R .amdil' option when stripping the binary, enqueueNDRangeKernel works without problems.
CPU:
It works with or without the '-R .amdil' option.
I have the same problem. Mobility Radeon 4850, 0x3ee. The only difference is that the exception occurs when clEnqueueReadBuffer is called, not enqueueNDRangeKernel.
when i started to use binary kernels i had exactly the same problem. i also found out that when not using -R .amdil it still works.
but then i did some changes to my kernel code and the problem disappeared. i think it was after adding the __attribute__((reqd_work_group_size (64, 1, 1))) line to all my _kernel functions.
this is an adequate workaround for me.
Originally posted by: quadboon when i started to use binary kernels i had exactly the same problem. i also found out that when not using -R .amdil it still works.
but then i did some changes to my kernel code and the problem disappeared. i think it was after adding the __attribute__((reqd_work_group_size (64, 1, 1))) line to all my _kernel functions.
this is an adequate workaround for me.
Could you please past your kernel code and local work group size you are sending to clEnqueueNDRangeKernel here?
the kernel code is ways to big and its closed source. however, this is how i run clEnqueueNDRangeKernel():
const size_t global_work_size[3] = { 64 * max_compute_units, 1, 1 };
const size_t local_work_size[3] = { 64, 1, 1 };
...
err = clEnqueueNDRangeKernel (command_queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, NULL);
...
__kernel void __attribute__((reqd_work_group_size (64, 1, 1))) func1 (...)
{
...
}