cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

"Failed to create temporary linear stream" error

what cause?

What this error exactly means and what typical cases for it to appear?

I recive it after many iterations of same code fragment.
Sometimes it diasppears just to appear after few "clean" iterations...
0 Likes
10 Replies
genaganna
Journeyman III

It would be good if you copy code fragment.

 

what error you are getting?

 

also post your system information(Brook+ version, driver version, OS, 64 or 32 bit )

0 Likes

I getting error, listed in topic title, namely:
Failed to create temporary linear stream

Stream SDK 1.4, Vista x86 SP1, Catalyst 9.1.

code:
http://pastebin.com/f5248c695


Corresponding kernel:
http://pastebin.com/f6c99b53c


Its IL representation:
http://pastebin.com/f25bf557f


I need info about typical reasons of this error. This error listed in Stream guide, but no more info provided.

[Sure I can guess that Brook tries to create some temporary stream and fails, but why it fails, what allowed sizes and so on and so forth... Such error message can't be foundation of any debug decision...]
0 Likes

in case some-body is using 2D Scatter stream in kernel, we need to create a temporary linear CAL resource.
In case, CAL resource allocation fails for this linear buffer, we get this error.


One way to avoid creation of temporary linear stream is to use 1D scatter streams with size < 8192.

0 Likes

Originally posted by: genaganna


One way to avoid creation of temporary linear stream is to use 1D scatter streams with size < 8192.


OMG, I can't use so small stream sizes. It will lead to even more kernel calls and even bigger overhead. This overhead is unacceptable high even now....

What if I will use scatter stream of the same size but 1D, not 2D one ?
Will this prevent from temporary linear stream creation?

Another variant: if I will use 2 or 4 or 8 scatter streams with smaller size ?

EDIT: Please, list other ways, proposed way is unacceptable.

ADDON:
What if I will use non-scatter 2D array? What limitations exist for non-scatter 2D (or 1D) streams?

Do same limitations as for scatter streams apply to gather streams?

ADDON2:
And could you give some explanation please, why number of such errors increase when I add error checking for input stream (i.e. call error() on each input stream before kernel launch )? How error state checking of input influent on memory allocation for output stream ?

ADDON3:
And what more in-depth guides available about Brook+ and its limitations than Stream guide?
0 Likes

By default, Brook+ stream declaration creates a tiled CAL resource. But, scatter stream usage requires temporary linear CAL resource for kernel execution. If enough memory is not available for this temporary buffer, this error occurs.

It is always better if temporary linear stream creation can be avoided both becuase of huge memory requirements and performance overhead.

What if I will use scatter stream of the same size but 1D, not 2D one ?
Will this prevent from temporary linear stream creation?


You should use a 128-bit data-type 1D stream of size < 8192 if you need to use scatter stream.

Another variant: if I will use 2 or 4 or 8 scatter streams with smaller size ?


Unfortunately, ATI Stream doesn't allow using more that one scatter stream in a kernel.

What if I will use non-scatter 2D array? What limitations exist for non-scatter 2D (or 1D) streams?


There are no constraint on regular output streams and no such temporary stream are created in that case.

Do same limitations as for scatter streams apply to gather streams?


Gather streams doesn't have any such limitaton. Scatter streams require a special linear buffer on hardware, gather streams doesn't have any such requirements.

And could you give some explanation please, why number of such errors increase when I add error checking for input stream (i.e. call error() on each input stream before kernel launch )? How error state checking of input influent on memory allocation for output stream ?


Brook+ doesn't create CAL buffers at the time of stream declaration, but it tries to delay the process of buffer allocation. Error checking forces buffer allocation in case it is not done already. That's why you might see these errors more frequently as with error checking you might have lesser memory available on hardware.

0 Likes

Thank you for so detailed answer.
Unfortuntely kernel calls [and/or] data transfers have too big overhead. So I forced to use as small kernels call number as possible. This leads to increse size of data array each kernel should deal with.

Is it possible to get random write access [what scatter stream does] to pretty big memory buffer [much longer than 8192 x 4 bytes] via IL level? Or I will encounter the same limitation there too?

And what currently recommended technique to access to big randomly accessed by write memory buffers?
Or there is no support of such buffers in current ATI hardware at all?
0 Likes

Brook+ too have suport for scatter buffers > 8192 * 16 byte. But, it has some overhead of data copying from linear CAL buffer to regular CAL buffer.

You can avoid this overhead if you use CAL.

0 Likes

Originally posted by: gaurav.garg

Brook+ too have suport for scatter buffers > 8192 * 16 byte.

This support is very questionable (see name of this topic).
My GPU has 512 MB of RAM, all memory used by my app never come even near to this quantity, but time to time that internal buffer allocation fails.
I would not call such state as "have support". Maybe "pretend to have support" 😉
To refine my question:
Do some still not very clear conditions that lead to unavailability to create temporary buffer for Brook+ code appicable to CAL/IL code too?
I.e., what restrictions on size of random writable 2D GPU memory buffer one will meet in CAL/IL ?


0 Likes

Still don't get clear picture how to predict this error (and to avoid it).
For example, debug output of my function:

gpu_temp size:1092 x 384 = 419328 elements of type float
gpu_temp size:1091 x 384 = 418944 elements of type float
gpu_temp size:1089 x 384 = 418176 elements of type float
gpu_temp size:1088 x 384 = 417792 elements of type float
gpu_temp size:1086 x 384 = 417024 elements of type float

ERROR: Retries left: 9: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 8: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 7: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 6: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 5: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 4: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 3: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 2: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 1: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float

ERROR: Retries left: 0: GPU_fetch_array_kernel3: Kernel Execution : Failed to create temporary linear stream
gpu_temp size:1084 x 384 = 416256 elements of type float
gpu_temp size:1084 x 384 = 416256 elements of type float
gpu_temp size:1083 x 384 = 415872 elements of type float
gpu_temp size:1081 x 384 = 415104 elements of type float

It executed w/o problems kernel with bigger streams (actually, error appears not on first function pass),it executed kernels with almost same size after error...
So, the reason of error seems unpredictable.
At least, it's not just big or small stream size I try to use.
0 Likes

Originally posted by: gaurav.garg


Do same limitations as for scatter streams apply to gather streams?





Gather streams doesn't have any such limitaton. Scatter streams require a special linear buffer on hardware, gather streams doesn't have any such requirements.





Could you be more specific about size of this temporary buffer?
I have 512MB onboar memory with my HD4870, but sometimes it can;t process 384x1715 of float elements.
And sometimes it can process bigger ones. Why?
Some very delayed buffer deallocation? Some memory leak?

0 Likes