I'm using Catalyst 12.2 preview, Windows 7 64bit, AMD SDK 2.6, ATI Radeon 5870.
The following kernel crashes my program upon compilation every time without fail. It also crashes Kernel App.
The problem appears to lie with control flow as if I remove/add some break/return points in the code it compiles (although I'm pretty sure it stiill doesn't run correctly when it does compile).
Any help? The problem is I haven't even began to develop the code and it won't compile at this stage which doesn't bode well...
Here is the code (apologies for the poor formatting):
Code |
---|
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable #define FLOAT_MAX_POS 3.40282346638528860e+38f __constant sampler_t samplerA = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; inline float3 safe_recip_float3(const float3 Val)
//gets rid of infinites
__kernel void AceTraceRender(__constant float* TraceParamsD, __global const int* TraceParamsI,const __global const float *ATree,const unsigned int VertDataOffset,__read_only image2d_t DepthBuffer,__write_only image2d_t ScreenBuffer) float3 RayPosition,RayDirection,RayDirectionInv,VWeight,TriX,TriY,ClosestVWeight; const __global float4 *TriangleVert = (const __global float4*)&ATree[VertDataOffset];
xzlen = length((float2)(RayDirection.s0,RayDirection.s2)); VertZ = read_imagef(DepthBuffer,samplerA,PixelPos); while(1) NodesToCheck = 2 - isequal(BSV0 , FLOAT_MAX_POS); if(NodesToCheck == 0) VertX = mad(OVertZ,sinY,OVertX*cosY); val = (TriX.s0 * TriY.s1) - (TriX.s0 * TriY.s2) - (TriX.s1 * TriY.s0) + (TriX.s1 * TriY.s2) + (TriX.s2 * TriY.s0) - (TriX.s2 * TriY.s1); TriX.s0 = VertX.s3; val = (TriX.s0 * TriY.s1) - (TriX.s0 * TriY.s2) - (TriX.s1 * TriY.s0) + (TriX.s1 * TriY.s2) + (TriX.s2 * TriY.s0) - (TriX.s2 * TriY.s1); if(isequal(val,0.0f)+any(as_int3(VWeight * (1.0f - VWeight)))==0) if(ClosestID!=-1)break; |
I've managed to re-structure the code to take away some of the break/return values, so it now compiles & runs correctly, but my concern is (apart from it shouldn't crash anyway) is that when I expand the code and add more conditional control flow it's going to break again....
Thanks for reporting this, I've been able to reproduce the crash and will be letting the correct engineer fix the problem.
I've encountered another *different* problem, compiling for both the GPU and CPU (one hangs in infinte loop, one cannot compile)
I managed to modify the previous code (remove some breakpoints).. everything was working correctly, then...
I added a single line that "if(somelocalvariable ==0)return" - this should prematurely end the kernel (for my debug purposes). Strangely, while everything was OK before, adding this line causes the kernel to get stuck in some infinite loop causing the display driver to reset - (completely the opposite of what should happen!).
All I can say is the compiler is taking a bad step - in this case I can confidently say premature termination of the kernel cannot influence any other thread (i.e. there are not dependencies/comm. between threads that could cause one to infinitely loop by ending another).
Furthermore....when I compile for the CPU device, the compiler fails with the error: "8Internal Error: ld failed"
Sorry you won't be able to replicate the gpu hang as there is alot of dependent data and host code, but you can test the CPU-device failing to compile...
code:
Code |
---|
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics : enable #define FLOAT_MAX_POS 3.40282346638528860e+38f #define FLOAT_MAX_NEG -3.40282346638528860e+38f __constant sampler_t samplerA = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; __kernel void test(__constant float* TraceParamsD, __global const int* TraceParamsI,__global const float *ATree,uint const VertDataOffset,__read_only image2d_t DepthBuffer,__write_only image2d_t ScreenBuffer) xzlen = length((float2)(RayDirection.s0,RayDirection.s2)); //voxel distance NodesToCheck = 2 - isequal(BSV0 , FLOAT_MAX_POS); NodeCompletionStack |= as_uint(max(NodesToCheck-1, 0)) << NodeDepth;
VertX = mad(OVertZ,sinY,OVertX*cosY); val = (TriX.s0 * TriY.s1) - (TriX.s0 * TriY.s2) - (TriX.s1 * TriY.s0) + (TriX.s1 * TriY.s2) + (TriX.s2 * TriY.s0) - (TriX.s2 * TriY.s1); TriX.s0 = VertX.s3; val = (TriX.s0 * TriY.s1) - (TriX.s0 * TriY.s2) - (TriX.s1 * TriY.s0) + (TriX.s1 * TriY.s2) + (TriX.s2 * TriY.s0) - (TriX.s2 * TriY.s1); if(isequal(val,0.0f)+any(as_int3(VWeight * (1.0f - VWeight)))==0) |
Just to confirm the CPU compilation fails even without the "if(NumPolygons==0){return;}" statement where the GPU compiles and runs fine when this is deleted.