cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ryta1203
Journeyman III

Scatter Question

Is it possible to scatter into a stream without using the scatter function. I am looking at the scatter-gather example in the SDK and it's a very simple example.

For instance, if I want to do something like this:

a[0] = x * y / z + c[4]

The problem is that the array sizes for a and c are not the same, a is larger by *8.

I need something like:

a[indexof(c)+height*length*8] = x*y/z+c;

where c is a stream with dimensions height*length

I know I can do this:

c = x*y/z+a[indexof(c)+height*length*8], so is there anyway to do the opposite of that?
0 Likes
24 Replies
michael_chu
Staff

Hi ryta1203,

I believe you can also do scatter with the [] notation on the output stream.

It is important to note that you will need to scatter in 128-bit chunks though.

Michael.
0 Likes

Take a look at the topic: Scatter stream base type has to be 128 bit
0 Likes

Michael,

I'm not sure why this code crashes:

kernel void kern(float index<>, float d[], float a[], float b[], float size, out float4 c[])
{
if (d[index] == 2 || d[index] == 3)
{
c[index] = d[index];
}
}

index is of the same size as d and goes from 0 to size-1

Also, I want to write c.x back to a 1D array. Is this possible?

Is there an example of scatter using this method in the samples I can look at? All I've seen is the scatter using the scatterOp function.
0 Likes

Hi ryta1203,

I assume it is crashing for you when using the CAL backend? Does it work properly when you use the CPU backend? (just to make sure no value of index is out of bounds).

When you say you want to write c.x back to a 1D array, are you saying in a different kernel? Or that you wanted to also treat c as an input stream as well?

I believe the way you are doing it is the correct syntax. You need to scatter in 128-bit chunks.

The only thing that MIGHT be an issue is the assignment of a float to a float4.

Let me check with the Brook+ team to see if this could be the problem.

Michael.
0 Likes

Originally posted by: michael.chu@amd.com

Hi ryta1203,



I assume it is crashing for you when using the CAL backend? Does it work properly when you use the CPU backend? (just to make sure no value of index is out of bounds).



When you say you want to write c.x back to a 1D array, are you saying in a different kernel? Or that you wanted to also treat c as an input stream as well?



I believe the way you are doing it is the correct syntax. You need to scatter in 128-bit chunks.



The only thing that MIGHT be an issue is the assignment of a float to a float4.



Let me check with the Brook+ team to see if this could be the problem.



Michael.


I assume I am using the CAL backend since I haven't changed any settings and am using the defaults. It crashes even if I give it a static index value.

Basically, I want to input the stream C to the kernel as an array, then I want to assign each element of the array a float value (since you have to use float4 or double2, I would like to assign this value to the .x, OR assign it to all .xyzw and then just streamWrite back the .x)

I want to write all of the .x of the stream C (which in the kernel is passed as an array) back to the 1D array I have created in MAIN()

0 Likes

Hi ryta1203,

I noticed you are using the float type for the index.

Can you try using int instead?

Michael.
0 Likes

Originally posted by: michael.chu@amd.com

Hi ryta1203,



I noticed you are using the float type for the index.



Can you try using int instead?



Michael.


Michael,

ints aren't supported yet so how would I go about using them. As I described in the other thread, I tried multiple things, none of which worked.
0 Likes

Hi ryta1203,

As noted in the other scatter thread you posted, we'll post something soon, as soon as one of our AEs has had a chance to test out the sample from the engineers upstairs.

Michael.
0 Likes

Michael,

I saw your other post too, that's great news, thanks a bunch!! Meanwhile, I will switch over to CAL since my app definitely needs scatter ability.
0 Likes

Hi,

There is now a scatter example available at:

ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/samples

This code reflects the current scatter limitations (1D scatter target stream, 128 bit
element size).

Simply drop the scatter directory into your desktop and build.

-- marcr
0 Likes

Will we see the ability to scatter without the 128 bit element size limitation in an upcoming release?

It seems to me that the scatter ability is widely used/needed and without it Brook+ is very limited. If we can make a "feature request" this would be mine, the ability to scatter (even limited to a 1D array) without the 128 bit element size limitation.

As it stands now, if you want to have the ability to scatter a 1D array of "float", you have to create a wrapper function that transfers all the 1D floats involved to 1D float4.x and then read them back from float4.x into float. Is this correct? The scatter example doesn't deal with this issue so I am assuming that this is true. This will incur some overhead, particulary for larger data sets.
0 Likes

Hi ryta1203,

This is a function of what the hardware itself provides at the moment unfortunately. The hardware is optimized to do 128-bit writes. It is definitely on my feature request list so that it is revisited when it is practical to do it.

At this moment, if you absolutely need to deal in floats instead of float4s then, yes, that is the sequence of operations you need to make.

Michael.
0 Likes

Hi ryta1203,

I stand corrected... 🙂 I was told by an engineer on the team that actually the hardware is capable of scattering on a 32-bit granularity level. The request has already been made to the appropriate team to take a look at adding that capability to the tools.

I apologize for the confusion!

Michael.
0 Likes

Michael,

I've looked at the example and attempted to mimic a simple example of my own:

I could not get the "sample" to execute because of a missing MSVCP80.dll, so I don't know if it will run or not. I know it will compile, but then again, so does my code below which does not run.

kernel void kern(float4 a[], float4 b[], out float4 c[])
{
float idx = indexof(c);
c[idx] = a[idx]+b[idx];
}

This kernel; however, crashes the program. All of the array sizes (stream sizes) are the same, which in this example happens to be size 8.

The b and a arrays are initialized to 0-7, respective to the array index (ie. [0] = 0, [1]=1.....[7]=7).

The program crashes at kernel call.


Any suggestions would be much appreciated. I apologize, I'm not sure why I am having so many problems getting scatter to work.
0 Likes


Hi ryta1203,

Can you go to Project->Properties->Configuration Properties->
C/C++->Code Generation, and set "Runtime Library" to "Multi-threaded DLL (/MD)"?
That did the trick for me. It appears that this setting gets replaced with /MT when moving an existing project directory around, which then leads to the MSVCP80.dll error.

marcr

0 Likes

marcr,

This got rid of that error in "Release" but not in "Debug".

In "Release, the sample still crashes giving no output and a message box saying:

"Debuggin information for "scatter.exe" cannot be found or does not match. Binary was not built with debug information.

Do you want to continue debugging?"

0 Likes


Hi,

It appears I made a mistake when uploading the original example, sorry about that.

Can you please go to the ftp site again, and grab either of "scatter" or "hello_brook".
We've massaged those so that you can drop them into your desktop, and build and
any Release/Debug combo (but only Win32 on 32 bit systems, and x64 on 64 bit
systems).

Let me know how it goes.

marcr
0 Likes

marcr,

Thanks. I will take a look at it and see if I can get my code working.

0 Likes

Here is my code. I have changed all the Project Properties to be the same as in your scatter example. At this point I can't really think of a simpler example. This just crashes when it calls the kernel. It runs fine if BRT_RUNTIME (which I had to create, it's not created automatically) is set to "cpu" but not when it is set to "cal".

#include < stdio.h >
#include < stdlib.h >

#define size1 2*2*2
#define size2 2*2

kernel void foo(float4 a[], float4 b[], out float4 c[])
{
float idx = indexof(c);
c[idx] = a[idx]+b[idx];
}

int main()
{
int j=0;
float num;
float4 b < size1 > ;
float4 c < size1 > ;
float4 a < size1 > ;
float4 g[size1];
float4 h[size1];

for (j=0;j < size1;j=j+1)
{
g.x= (float)j;
g.y= (float)j;
g.z= (float)j;
g.w= (float)j;
h.x=(float)j;
h.y=(float)j;
h.z=(float)j;
h.w=(float)j;
}

streamRead(b, g);
streamRead(a, h);
foo(a, b, c);
scanf_s("%f", &num);
streamWrite(c, g);
}

What would be the reasons it would run in cpu but not in cal? Also, I'm currently using 2900xt, don't think that should matter though, since it's R600.
0 Likes

This works fine on my system (and produces the correct result).
Can you run any Brook programs on your system at all?

Feel free to send you project file to streamdeveloper@amd.com.
I will build and run it on my system, then at least we know
if it's something in the project or your system.

Thanks,

-- marcr
0 Likes

marcr,

Yes, I only have problems when using scatter.

I have emailed my project file and code. I'm sure there is something I am missing, I'm just not sure what.

Thank you,

Ryan
0 Likes

Hi Ryan,

I just saw in a separate topic that you are using an R600 card.

You need an RV670 card (Radeon HD 3870 or FireStream 9170) to use scatter or DPFP. Those features were introduced in that GPU.

This might be what is causing your problem.

Michael.
0 Likes

Michael,

As I said in my email, I am using R600. I will go back and check the documentation/release notes again. I must have missed this.

EDIT: This is not in the relase notes. The release notes (Mar-8) specify:

Scatter
-------

Scatter to 1-dimensional targets is supported. The syntax is similar to gather
operations, in that the stream is bound using square brackets instead of angle
brackets and elements are accessed in an array-like fashion.

Double Precision
----------------

Double precision is supported on cards that have the necessary hardware
support. Brcc does not currently automatically promote or downcast between
float and double - the user must add explicit casts.

All floating-point literals are still single precision.


There is no mention of "necessary hardware support" under the "Scatter" section as there is in the "Double Precision" section. Can this be changed?

Thank you,

Ryan
0 Likes

Hi Ryan,

This should now be added to the release notes in the next release.

Sorry for the confusion!

Michael.
0 Likes