Is there some restriction as to the total length of a kernel? We are trying to perform a rather involved calculation (partial contraction of two rank 4 tensors) for each stream element. Adding enough statements (all simple adds and mults) to the kernel will crash brcc. As the kernel is really long, I will not post it here - if you want o have a look, let me know where to send it.
Cheers
Nik