cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

oscarbarenys1
Adept II

Features of Stream SDK 1.2?

Hi,

seems that SDK 1.2 is coming this week..

can someone at AMD elaborate of what new features or examples to expect?

It has been said to support HD4xxx series and Vista..

Specifically what about this features:

1.Interoperability with graphics APIS: In presentations in a Siggraph course Houston showed that DirectX9 and DX10 interop is coming.. in 1.2? also what about OpenGL?   

2.Shared memory: In AMD H4xxx launch presentations AMD showed that this new gen. cards had improved stream features among others shared memory which seem similar to Nvidia CUDA shmem.. Will be exposed in 1.2?.. In CAL only or in Brook also..

3.Sinchronization primitives. As in CUDA.. seem to be apropiate once shared mem is exposed..

4. Atomic intrustions for accesing memory..

In my humble opinion (Sorry for saying so clear)  this three features seem to

be three major areas in which CUDA is superior to your SDK for computing..

(albeit it all now seem software deficiencies excepting perhaps atomic instructions)..

I want to express this as constructive opinions and I also am aware that your hardware for example for double precision computing is better than competition solutions..

 

Thanks,

Oscar.

 

 

 

 

0 Likes
23 Replies
ryta1203
Journeyman III

Yes, CUDA is superior to Firestream at this point.

The one thing that really stands out to me is how Nvidia can have a ~40 page document that one can read and begin programming in CUDA immediately with little to no questions while AMD has all this documentation and it's virtually useless. Too many companies these days under emphasize documentation and the importance of it, IMO.
0 Likes

Originally posted by: ryta1203 Yes, CUDA is superior to Firestream at this point. The one thing that really stands out to me is how Nvidia can have a ~40 page document that one can read and begin programming in CUDA immediately with little to no questions while AMD has all this documentation and it's virtually useless. Too many companies these days under emphasize documentation and the importance of it, IMO.


Can you refer me please to that documentation?

Is just that I'm in high school and have virtually no resources for learning, can you help me a bit?

For example, If I can read and understand the documentation, would I be able to make little apps? I'm not specting to write a real time ray tracer engine, but simple programs.

0 Likes
Ceq
Journeyman III

Well, I think Brook has a great programming model, cleaner and easier to learn than CUDA, however it's
true that it is still in development and documentation is quite poor. Shared memory access would be a
really powerful feature, but I think there are a few things to be fixed before that, like in kernel array support.
0 Likes

Ceq,

Although I disagree that Brook+ is cleaner and easier to use than CUDA (I have no idea how you came to this conclusion), even if Brook+ was the best ever it still is no good if the company can't get across to developers how to develop using it. Nvidia is light years ahead of AMD in this regard.
0 Likes

IMO, the doc is a big problem

0 Likes

Originally posted by: traits

IMO, the doc is a big problem


Unfortunately, looking at the v1.2 documentation, the Brook+ documentation has NOT changed at all and is still VERY VERY poor. In fact, the "-A" is still in the sample VS2005 window screenshot from v1.0 Beta. The "new"scatter sampe isn't even included in this version. It looks like AMD didn't even touch anything having to do with Brook+, which for a company that asserts "you should use Brook+ and only use CAL when you have to really fine tune" this doesn't make sense.

From the release notes it looks like little has changed in the way of Brook+, which again, is very disappointing.

So when can we expect another update and will there be any additions to Brook+?

From page C-1 are the 3850s not supported?

EDIT: To be severely blunt, the documentation looks like a 5th grader wrote it. The same things are said over and over without going into much detail or depth.

More so, it seems now that chaining streams through kernels is impossible, is that true? If so, why? All this does is create more reading from main memory and more streams to be created, really decreasing efficiency of the GPU computational power.

So in order to get a virtual bidirectional stream you have to:

1. create two streams, same size
2. pass the one as input the other as output to one kernel
3. create a kernel to transfer the data between them
4. go to step 2

Does that make sense? Is there an easier way to do this?

Brook+ code:

stream one<10>;
stream two<10>;

streamRead one(one, c_one);
kernel1(one, two);
trans_kernel(two, one);
kernel2(one, two);
etc.
etc.

You now have to create an extra kernel to transfer the data from 1 kernel to the next?
0 Likes

Ryta,
The currently agreed upon best way to do what you want to do is to ping-pong buffers back and forth. This is what is done in bitonic_sort and a couple of other samples.

For example using your pseudo code it would look like this:
streamRead(one, c_in);
kernel1(one, two);
kernel2(two, one);
streamWrite(one, c_out);

This works if the streams are of the same size and type.
0 Likes

Micah,

Thanks. I guess it's a little counter-intuitive to have two of the exact same array. It seems that this might effect memory size issues, having to have duplicates for every array that you wanted to use further reducing the application usage of this SDK, right?
0 Likes
Ceq
Journeyman III

To xtremeleo:

To get the documentation just download the SDK and install it, it is inside a folder named 'doc' in your Brook+ directory.
You will also find a 'samples' folder:
Inside 'tests' there are some recommended examples to get used to Brook+ syntax
Inside 'samples' there are some more advanced programs to show Brook+ capabilities.

There is something like a quick language description in the original brookgpu project homepage, Brook and Brook+ are nearly identical:
http://www-graphics.stanford.e...cts/brookgpu/lang.html
0 Likes

Thanks a lot Ceq, but still I have only a vague idea. I've always worked with Visual Studio and the SDK's, I've used were a bunch of dll's but now this brings with the direct source code, C++?

Can I use Mono with Stream SDK?

Or can you recommend me a good IDE for Ubuntu?

Thanks for your help

 

0 Likes

Maybe you can figure out the new features if you download it?

 

ftp://ftp-developer.amd.com/AMD_Stream_SDK/v1.2-beta/

 

I am not sure why no one announced this download so far.

0 Likes

Originally posted by: shormanm

Maybe you can figure out the new features if you download it?




 




ftp://ftp-developer.amd.com/AMD_Stream_SDK/v1.2-beta/




 




I am not sure why no one announced this download so far.



This url is not working.

0 Likes

SDK v1.2-beta is soon coming to a spanking new web page near you; but till then, here is the download link.

 

0 Likes

Why is this thing dated 7/23? Is this the newest v1.2?
0 Likes

Originally posted by: xtremeleo

Thanks a lot Ceq, but still I have only a vague idea. I've always worked with Visual Studio and the SDK's, I've used were a bunch of dll's but now this brings with the direct source code, C++?




Can I use Mono with Stream SDK?




Or can you recommend me a good IDE for Ubuntu?




Thanks for your help




 




If you have always used VS why not just stick with that, particularly since AMD insists on giving better VS (Windows) than GCC (linux) support?
0 Likes

Thanks ryta1203, I just wanted to play a little bit on Ubuntu, because Vista is still unsupported, in the meantime version 1.2 is out, which is a mess because some users have found but here isn't any official release

Nevertheless thanks, I'll stick with VS 2005.

0 Likes
rahulgarg
Adept II

ftp://streamcomputing:streamcomputing@ftp-developer.amd.com/AMD_Stream_SDK/
However I dont think this is an official release. Probably a pre-beta and not probably intended for public consumption.
From the documentation I see support for local data share, dx9/10 interop and even some sync primitives in CAL. I dont use Brook+ so dont know whats new there.
0 Likes

Ryta,
This is one of the constraints with the stream programming model. The input streams and output streams are mutually exclusive in order to guarantee the property of streams. This programming model also maps very well to the underlying hardware since the graphics cards are in essence streaming processors with a memory model that has distinct input and output memory address spaces. The downside of this is that trying to map algorithms that assume a uniform memory space is a little more difficult that require use of out-of-place techniques. The example given previously, bitonic sorting, is one such problem. Instead of sorting only in one data array, the output is written to a second stream which is used as the input stream in the next pass.

If there was a uniform memory space, then using the same stream for input and output would cause no problems, but that is not the case with graphic cards.
0 Likes

Micah,

Just for the sake of accuracy, I think we should specify that "that is not the case with AMD's graphics cards".

It seems that either Nvidia has a unified memory space or at least their CUDA API treats it as such, either way as a developer it doesn't matter, I can program it as a unified memory space.

Since AMD went more traditional/static with their API I guess the ping pong effect is like having a double buffer that you have to swap.
0 Likes

From the Brook+ release notes:

o The runtime now enforces the rule given in the language spec
  that makes it illegal to bind the same stream for both read
  and write. For compatibility with existing code, this type
  of aliasing is permitted if the environment variable
  BRT_PERMIT_READ_WRITE_ALIASING is defined.

0 Likes

Originally posted by: udeepta@amd

From the Brook+ release notes:




o The runtime now enforces the rule given in the language spec
  that makes it illegal to bind the same stream for both read
  and write. For compatibility with existing code, this type
  of aliasing is permitted if the environment variable
  BRT_PERMIT_READ_WRITE_ALIASING is defined.





So is there some reason to not have this environment variable defined? I saw this in the release notes but not in the documentation (aka, the programming guide).
0 Likes

MicahVillmow,
I understand the problems of read/write streams when gathering and scattering data, even on CUDA this feature is bogus, but how how about non-gather/scatter read/write streams? Each thread will read and write only to it's own element, it could even be easily abstracted by the compiler if hardware does not support it...
For example:
kernel normalize (inout float<> a)
{
float n = a;
//...do anything on n
a = n;
}

What's would be the problem with this?
0 Likes

Eduardo,
There actually isn't any problem with that and also with scattering/gathering it is possible, and in the past it was allowed, but we have made the compiler more strict in following the spec. However as Udeepta put in an earlier thread, setting BRT_PERMIT_READ_WRITE_ALIASING should allow the aliasing between the read and write streams.

The one caveat, if you set the R/W aliasing flag, you must guarantee that the writes from one invocation of the kernel do not step on the reads of another invocation of the kernel.
0 Likes