App runs OK with CPU backend but fails to run on CAL backend.
errorLog() returned "Kernel Execution : Error with input streams
"
What typical reasons could lead to such situation?
ADDON:
The kernel in question:
kernel void GPU_fetch_array_kernel(float src[],int src_offset,out float dest<>)
{
dest+=src[src_offset+instance().x];
}
What are your stream dimensions? I have seen with recent drivers that large 1D streams > 8192 fails and shows the same error. Try to ckeck error on streams just after declaring them, it should give a better errorLog.
If you are using 1D streams > 8192, change your Catalyst to 9.2 and see if it works.
Originally posted by: gaurav.garg What are your stream dimensions? I have seen with recent drivers that large 1D streams > 8192 fails and shows the same error. Try to ckeck error on streams just after declaring them, it should give a better errorLog.
If you are using 1D streams > 8192, change your Catalyst to 9.2 and see if it works.
Thank you for hint.
Stream is 1D and its size >8192 indeed.
But there is no errors on stream creation and on stream filling from host memory buffer.
First error occured only in kernel that uses that stream as input parameter.
//Loading fold buffer data into GPU memory (into stream)
unsigned int stream_size=n_bins;
#if 1
fprintf(stderr,"Requested data stream size %u\n",stream_size);
#endif
brook::Stream gpu_data(1,&stream_size);
if(gpu_data.error())
fprintf(stderr,"ERROR in gpu_data (declaration): %s\n",gpu_data.errorLog());
gpu_data.read(data);
while(!gpu_data.isSync()) Sleep(0);
if(gpu_data.error())
fprintf(stderr,"ERROR in gpu_data: %s\n",gpu_data.errorLog());
Output is:
Requested data stream size 65536
And no errors from this fragment reported.
Will try to use older Catalyst drivers.
Changed Catalyst 9.5 to 9.2 and this error disappeared!
Thanks again for good advise.
Hope ATI/AMD will improve its software on next release, not degrade it...
Originally posted by: Raistmer
kernel void GPU_fetch_array_kernel(float src[],int src_offset,out float dest<>)
{
dest+=src[src_offset+instance().x];
}
AFAIK such a read-write access to the output stream is not allowed in Brook. I just tested it, and what actually happens is that
dest = 0.0f + src[src_offset+instance().x];
gets executed. At least that is what the StreamKernelAnalyzer tells me.
Thanks, it seems you are right. It should accumulate signal but seems doesn't. I know that test dataset contains few signals above threshold but running on CAL backend app founds no signals.
Again, the same app running on CPU backend found all signals that CPU version detected. It seems CPU backend far less usable for app checking than ATI advertises in its manuals... 😕
(BTW, this forum engine bugged in high degree. I tired to edit message - it get reparsed in something I don't intended to express).
From "Stream computing user guide" (they prohibit copy operation on pdf document, for what reason ???):
"
2.6.1.1 Dynamic Stream Management
Brook, BrookGPU, and the legacy version of Brook+ use a statically allocated stream graph and prohibit streams that are bound fr simultaneous read and write. At the C++ API level, there are no such restrictions ...
"
Now error from kernel:
Kernel Execution : Input stream is same as output stream.
Binding kernels read-write is prohibited.
What the hell ??
Well, just in case you didn't know, you can rewrite it as follows:
kernel void GPU_fetch_array_kernel(float src[], int src_offset, float destI<>, out float dest< > ) {
dest = destI + src[ src_offset+instance().x ];
}
And call it with the same parameter for dest and destI:
GPU_fetch_array_kernel(src, offset, dest, dest);
Note that while doing this you can't perform gather/scatter operations on "dest", only streaming, as it would result in race conditions and undefined behaviour. If you get a runtime error about using the same parameter as input and output in the kernel, set the environment variable BRT_PERMIT_READ_WRITE_ALIASING = 1.
Originally posted by: Ceq
If you get a runtime error about using the same parameter as input and output in the kernel, set the environment variable BRT_PERMIT_READ_WRITE_ALIASING = 1.
The problem is that you have normally no control over environment variables on the system the app is running on. At least if you intend to distribute it to a lot of people, as Raistmer wants to do (think of applications for Distributed Computing projects like SETI ). Okay, you could deliver a setup script, setting the variable, but I would prefer another solution.
If you don't like using a startup script you can change it inside the program, just use putenv function. Putenv can be used to set environment variables in a running program. Example:
int main(int argc, char *argv[]) {
putenv("BRT_PERMIT_READ_WRITE_ALIASING=1");
...
Originally posted by: Ceq If you don't like using a startup script you can change it inside the program, just use putenv function. Putenv can be used to set environment variables in a running program. Example:
int main(int argc, char *argv[]) { putenv("BRT_PERMIT_READ_WRITE_ALIASING=1"); ...
Thanks for hint, will keep it in mind, maybe it will be useful too.
LoL
Yes, it's exact that case.
I came to additional accumulator stream creation alredy too, thanks.
Originally posted by: Raistmer LoL
Yes, it's exact that case.
I know. Btw., I've chosen Milkyway@home as this much smaller project fits better to my limited time resources
You should be glad SETI works only with float values. Using doubles for MW forced me to basically write the kernels in IL assembly. I used brook only for prototyping. I experienced some quite severe bugs of the SDK which made the "repair" on the IL level necessary. But I was amazed to see that some of them (like a mixed up ordering of arguments in the constant cache of the GPU when using gather arrays) only apply if you are working with doubles.
Yes, doubles used only in few places, most of processing goes in float.
BTW,
putenv("BRT_PERMIT_READ_WRITE_ALIASING=1");
didn't work unfortunately (that is, CAL error remains). Setting env variable on system level works though.
Originally posted by: Raistmer
putenv("BRT_PERMIT_READ_WRITE_ALIASING=1");
didn't work unfortunately (that is, CAL error remains). Setting env variable on system level works though.
I guess the brook runtime is initialized (and reads the environment variable) at startup of the program, so it is too late to change it within the program.
Seems so.
Runtime exists as brook.dll so it loaded before main() called.
It's initialization functions are called before too perhaps.
That is strange, as far as I know Brook+ runtime reads that variable the first time you define a stream. Maybe it is system dependant, I'm using WinXP x64, MSVC 2005, Brook+ 1.4 and Catalyst 9.5.
Try the following code:
File "ker.br"
kernel void inc(float in1< >, out float out1< > ) { out1 = in1 + 1.0f; }
File "main.cpp"
#include <cstdio >
#include <cstdlib >
#include "brook/Stream.h"
#include "built/ker.h"
using namespace std;
using namespace brook;
int main(int argc, char** argv) {
unsigned int i, SIZE = 1 << 4;
// Memory arrays
float* v = (float*)malloc(SIZE * sizeof(float));
// Set environment variable
putenv("BRT_PERMIT_READ_WRITE_ALIASING=1"); // *********
// Init
for(i = 0; i < SIZE; ++i)
v[i ] = (float)i;
{
// Stream arrays
Stream<float > s(1, &SIZE);
// Load
s.read(v);
// Kernel
inc(s, s);
// Save
s.write(v);
}
// Print
for(i = 0; i < 8; i++)
printf("v[%i] = (%7.3f);\n", i, v[i ] );
}
I had this error too after instaling the latest catalyst driver 9.6 (I skipped 9.5)
It took me some time and tests to realize that the size of 1D streams was now limited to 8192. Go figure... The error message does not help either to understand what's going on exactly.
Just so you know, you can use catalyst 9.4; the size limitation appeared since version 9.5
Originally posted by: youplaboom
Just so you know, you can use catalyst 9.4; the size limitation appeared since version 9.5
Originally posted by: Raistmer Thanks, it seems you are right. It should accumulate signal but seems doesn't. I know that test dataset contains few signals above threshold but running on CAL backend app founds no signals.
My standard solution to this problem is to use two streams (one input and one for the accumulated output) and switch them between consecutive kernel calls.
Originally posted by: Raistmer
(BTW, this forum engine bugged in high degree. I tired to edit message - it get reparsed in something I don't intended to express).
You aren't telling me anything new