Ah, it's good to be back and see familiar names.
I have been developing programs using GCN at the assembly level including frequent use of the gds memory.
I use a basic, but not so simple method that others here have used in the past.
1. develop an assembler/compiler tool to generate the desired GCN binary.
2. Use OpenCL to compile a simple shell program that uses the same or similar buffers/resources.
3. Open and unwrap the OpenCL elf binary file and replace the executable code with mine.
(this requires some fixups to make the new and old binary environments compatible)
4. Allow OpenCL to run the program.
This has been working fine for about a year using a 3X 7970 Tahiti system.
Then I blew one of the Tahiti cards and replaced it with a Hawaii R9 290X.
I have now updated all drivers and Opencl successfully and all runs as before,
The Hawaii 290X card cannot read the gds memory.
This seems odd because the 7970s and 290X are running identical code side by side in parallel.
The 7970s can read/write gds as before.
I have searched extensively for differences between the two OpenCL programs and see nothing obvious.
OpenCL is generating 3 binaries for Tahiti, Tahiti, and Hawaii (called Hawks in the binary file)
I am careful to make sure the OpenCL shell program for Hawaii goes to the R9 290X card.
The only thing I can think of is that GCN 1.1 requires a hardware setting to activate the gds that was not in GCN 1.0
Any clues about how Hawaii is different from Tahiti would be greatly appreciated.
Pointers to any relevant documentation would be wonderful.
If anyone is interested in our work, please see http://www.gene.neuralcortex.com/
We are developing ultra fast DNA processing for genetics and achieve about a 100X speedup
over other tools even ones using nvidia gpus.
Our claim is this speedup is from AMD GCN! (If only I can get gds working again )
Some previous threads on gds memory for reference.
Wow! What a great project you are doing!
I'm just guessing, but have you tried the M0 register? Maybe an the new chip expects an offset/size pair in it.
Yes, I was able to get it to function by setting a low value in m0, it must be less than equal to 0x1000 (bytes).
And, the amount of gds memory used must be less than or equal the same value.
So, you must be right that Hawaii needs a setting that was not used (or was preset in Tahiti) but so far I cannot find it.
Will post more if I do.
Cool. May I ask what you are using GDS for ? Also, it would be great if AMD provided an OpenCL extension
to access GDS. Also, what is the speed and latency of the memory ?
I am developing a Zcash miner:
I want to use GDS to store thousands of frequently accessed global counters.
GDS is supposed to be an order of magnitude faster than global memory, and it should give the miner a huge boost in performance.
You are supposed to use atomic operations to access GDS.
If engineers at AMD are reading my posts at all, I would really appreciate if you could provide more information about how to access GDS both on Windows and Linux. (I don't mind using the GCN assembly.) I am working on an open-source project, and that piece of information is crucial in achieving the same level of performance as closed-source counterparts.