cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

nberger
Adept I

Scatter stream base type has to be 128 bit

When trying to scatter write to a 2D stream I get the error message "Scatter stream base type has to be 128 bit" from brook. Does this imply that I can only do scatter to float4 and double2 streams? Is this a bug or a feature, and is it documented somewhere? Is it going to stay with us for the 1.0 release?

Thanks

Nik
0 Likes
17 Replies

Nik,
It is correct that it currently needs to be a float4 or double2 as a scatter target, however, this is not because of brook. This is a limitation imposed at the IL/CAL level and we are currently looking at ways around this. This is mainly done for performance reasons as if you export a float4, the performance is 22 GB/s, float2 exports at 5 GB/s and floats export at 1 GB/s. You can see this from the global_exp_IL example in the CAL sdk. Although you need a 128 bit export space, it is currently possible to write out float2's or floats. This is done by using write masking. So, if you want to write float2's, you could create a float4 stream and scatter using g[index].xy = someValue; As this would only write the first two values, therefor doing a float2 write.
0 Likes

Hi again!
I tried float4 and double2 streams for scatter and end up with the same error message. For my application, I found a way around the scatter, but I suppose you should look into this in some more detail.

Cheers

Nik
0 Likes

Originally posted by: MicahVillmow

Nik,

It is correct that it currently needs to be a float4 or double2 as a scatter target, however, this is not because of brook. This is a limitation imposed at the IL/CAL level and we are currently looking at ways around this. This is mainly done for performance reasons as if you export a float4, the performance is 22 GB/s, float2 exports at 5 GB/s and floats export at 1 GB/s. You can see this from the global_exp_IL example in the CAL sdk. Although you need a 128 bit export space, it is currently possible to write out float2's or floats. This is done by using write masking. So, if you want to write float2's, you could create a float4 stream and scatter using g[index].xy = someValue; As this would only write the first two values, therefor doing a float2 write.


Is there an example of using:

g[index].xy = someValue;

somewhere? My br file does not produce a cpp file when I try this.

For example:


kernel void kern(float index<>, float d[], float a[], float b[], float size, out float4 c[])
{
c[index].x =size;
}
0 Likes

Hi ryta1203,

You have to scatter to entire 128-bit chunks. Hence, c[index].x isn't going to work.

However, is it possible for you to calculate 4 float values per iteration of your kernel instead of a single float value?

Michael.
0 Likes

Originally posted by: michael.chu@amd.com

Hi ryta1203,



You have to scatter to entire 128-bit chunks. Hence, c[index].x isn't going to work.



However, is it possible for you to calculate 4 float values per iteration of your kernel instead of a single float value?



Michael.


Oh, I was just wondering because Micah says thats possible in his post:

So, if you want to write float2's, you could create a float4 stream and scatter using g[index].xy = someValue; As this would only write the first two values, therefor doing a float2 write

If you can't use float4 indexing (.xyzw) then all four values would have to be the same, correct?

I only have the problem with the float4 indexing (.xyzw) when using [] not when using <> for the out.

0 Likes

Nik,
Thats good. In most cases if you can write the app without using scatter and using the stream model, you will have increased performance. For this problem, do you have a small example that shows the problem so that we can get it fixed and add it to our testing?

Thanks
0 Likes

I just found (trying to come up with a simple example) that my problem might also be one of notation; whereas

kernel void test(float2 index, out float4 output[])
{
output[index] = 1.0f;
}

compiles ok

kernel void test(float2 index, out float4 output[][])
{
output[index] = 1.0f;
}

does not and produces rhe 128 bit error message. Does this imply that I do not have to indicate the dimensionality of the stream with the brackets?
Sorry for the confusion
Nik
0 Likes

Originally posted by: nberger

I just found (trying to come up with a simple example) that my problem might also be one of notation; whereas



kernel void test(float2 index, out float4 output[])

{

output[index] = 1.0f;

}



compiles ok



kernel void test(float2 index, out float4 output[][])

{

output[index] = 1.0f;

}



does not and produces rhe 128 bit error message. Does this imply that I do not have to indicate the dimensionality of the stream with the brackets?

Sorry for the confusion

Nik



Hey nberger, did the one that compile for you run fine? I have the same thing that compiled fine but then crashed on execution.
0 Likes

Well, the scatter operation is to a 1D surface and thus using double [] should be an error. The global buffer that brook utilizes to implement scatter can be thought of as a huge 1D array. I'll pass this on to the brook compiler people so that they can hopefully generate a more meaningful error message.
0 Likes

Just to clarify: Do you mean to say that scatter operations only work to 1D arrays (and are thus limited to 8192 elements) or that internally, 2D streams are somehow lrepresented in 1D for scatter operations?
0 Likes

Nik, Just talked to one of the brook developers. There are no limits on the 1D arrays as they are all internally address translated to 2D Streams. The syntax for scatter is 1D and the memory needs to be allocated as a 1D stream, but internal representation can be different. I've been told we have tested up to 8 million elements in a 1D scatter stream.
0 Likes

Ryta,
What michael is saying is correct. I was mistaking what is possible with what is implemented. The only example we have of doing the masking of global buffers is at the CAL/IL level, not at the brook level. So, to do what you might want would require coding at the IL level or patching the brook generated IL code.
0 Likes

Micah and Michael thanks.

I think I understand. It's not possible and you have to use something like:

c[index] = float4(.., ..., ..., ...);

Unfortunately, this is still crashing. If this syntax, or something like it, is correct how would you about writing this stream back out to a 1D array in main() for example?
0 Likes

Hi ryta1203,

I'll just write the same thing in this post as I did in the other post in case someone misses the other post... 🙂

Can you try using an int instead of a float for index?

Michael.
0 Likes

Michael,

Since ints are supported, I'm not sure how to go about doing that. I have tried several different things:

using a constant int for index (like [0]), this doesn't work
using a function parameter int (like func(..,..,..,.., int j).....;, this doesn't work
declaring an int inside the kernel and using that, this doesn't work

Is there an example I can look at because I didn't find one in the ..\samples\test or ..\samples\apps folders that shows how scatter (like this one) works.

I'm sorry to be a pain about this, any help is appreciated, I'm just stuck since scatter (like this) is supported, I would like to be able to at least implement a simple example (which is all I am trying to do).

EDIT:: If I was working with similar size arrays, then I could use a bunch of "if"statements such as: if (indexof(c) == somevalue), if(indexof(c) == someOtherValue), etc, etc, etc.. and then write "c = something + something" and that would work, however, since I am working witih dissimilar array sizes, I can't because I want to use an offset for "c", something like: c[indexof(d)+H*L*W], where d is a 2D array and c is a 3D array.

EDIT2:: When trying to use an int anywhere with the kernel, the brcc compiles with no errors but produces only the code UP TO the int for the cpp file. That is, the cpp file is created UP TO where the "int" word is, it stops right before that, so of course I get compilation errors when I compile my cpp file.
0 Likes
nberger
Adept I

Sorry, I actually did not try - I found a way to solve my problem without any scatter. I suppose if it crashes on execution with you, it will do the same with me...
0 Likes

Hi,

There is now a scatter example available at:

ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/samples

This code reflects the current scatter limitations (1D scatter target stream, 128 bit
element size).

Simply drop the scatter directory in your desktop and build.

-- marcr

0 Likes