cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

diapolo
Adept I

ALUBusy - an easy way to raise it?

I tried to implement 3-component vectors, but this just crashes ... could well be my fault, but is it working for someone in SDK 2.5?

This is with Cat 11.7 / Win7 x64 and 2- and 4-component vectors work well, no errors or crashes. Tested with 5870 and 6550D! APP KernelAnalyzer doesn't show any errors, too. APP Profiler doesn't give anything useful because of the crash if uint3 is used!

The ALUpacking value is round about 98,5%.

Global Work-size: 134217728

Work-Group-size: 256

Dia

0 Likes
diapolo
Adept I

ALUBusy - an easy way to raise it?

Originally posted by: gat3way Damn, I confused ALUPacking for ALUBusy.  ALUPacking should be the VLIW utilization while ALUBusy is the ratio of ALU ops.

 

If you have round constants in an __constant array, try offseting them to __private memory, this should help.

 

You mean copy them into a private variable during kernel execution instead of directly use the __constant?

Dia

0 Likes
diapolo
Adept I

ALUBusy - an easy way to raise it?

To the Vec3 thing, KernelAnalyzer outputs an error message directly in the Asembly tab: "Error: Another scalar op (gpr 6) has already used GPR read port 0 for chan 1 (gpr 127)".

Any ideas?

Dia

0 Likes
gat3way
Journeyman III

ALUBusy - an easy way to raise it?

Looks like you had exactly the same problem as me with uint3. Is that using 2.5?

As for __constant: do not copy anything. Just initialize a __private variable with the needed value.

 

The ALUpacking value is round about 98,5%.

Global Work-size: 134217728

Work-Group-size: 256

 

By the way, I just tried to compile my bitcoin kernel using the offline devices extension. When I compile from source and do clBuildProgram() it is OK. The precompiled kernel for Barts does not execute though. Profiling it with sprofile displays the same "strange" global work size. I have an error about jump to non-existant address or something like that in the ISA dump.

0 Likes
diapolo
Adept I

ALUBusy - an easy way to raise it?

Yes, this is for SDK 2.5 and it seem's to be some kind of bug. I opened a developer ticked, but got no response till now.

If I get new informations, I will post here!

Edit: Using __private instead of __constant leads only to 1 more GPR used for 58XX cards. No improvement in KernelAnalyzer.

Dia

0 Likes
genaganna
Journeyman III

ALUBusy - an easy way to raise it?

Originally posted by: diapolo I tried to implement 3-component vectors, but this just crashes ... could well be my fault, but is it working for someone in SDK 2.5?

This is with Cat 11.7 / Win7 x64 and 2- and 4-component vectors work well, no errors or crashes. Tested with 5870 and 6550D! APP KernelAnalyzer doesn't show any errors, too. APP Profiler doesn't give anything useful because of the crash if uint3 is used!

The ALUpacking value is round about 98,5%.

Global Work-size: 134217728

Work-Group-size: 256

Is it crashing in Runtime code or kernel code?  Make sure alignment are proper for vec3 as it is required memory exactly same as vec4.

0 Likes
diapolo
Adept I

ALUBusy - an easy way to raise it?

Originally posted by: genaganna
Originally posted by: diapolo I tried to implement 3-component vectors, but this just crashes ... could well be my fault, but is it working for someone in SDK 2.5?

 

This is with Cat 11.7 / Win7 x64 and 2- and 4-component vectors work well, no errors or crashes. Tested with 5870 and 6550D! APP KernelAnalyzer doesn't show any errors, too. APP Profiler doesn't give anything useful because of the crash if uint3 is used!

 

The ALUpacking value is round about 98,5%.

 

Global Work-size: 134217728

 

Work-Group-size: 256

 

 

Is it crashing in Runtime code or kernel code?  Make sure alignment are proper for vec3 as it is required memory exactly same as vec4.

 

As I said in the KernelAnalyzer the right tab holds the object code, which is asembly for a specific device for which the OpenCL kernel is compiled. In this window I get the above mentioned error message (Error: Another scalar op (gpr 6) has already used GPR read port 0 for chan 1 (gpr 127)), if I try to use vec3 in my kernel.

The generated object code is way to small, so it's clear now, why the application, which uses the kernel, crashes with Vec3 enabled.

What do you mean the memory layout has to be the same as for vec3? There is 1 kernel parameter, which is uint3 for the vec3 version and uint4 for the vec4 version. They are filled with 0, 1, 2, 0, 1, 2 ... for vec3 and 0, 1, 2, 3, 0, 1, 2, 3 ... for vec4.

Dia

0 Likes
genaganna
Journeyman III

ALUBusy - an easy way to raise it?

Originally posted by: diapolo
Originally posted by:  

 

As I said in the KernelAnalyzer the right tab holds the object code, which is asembly for a specific device for which the OpenCL kernel is compiled. In this window I get the above mentioned error message (Error: Another scalar op (gpr 6) has already used GPR read port 0 for chan 1 (gpr 127)), if I try to use vec3 in my kernel.

Is it possible for to copy kernel code here?

 

What do you mean the memory layout has to be the same as for vec3? There is 1 kernel parameter, which is uint3 for the vec3 version and uint4 for the vec4 version. They are filled with 0, 1, 2, 0, 1, 2 ... for vec3 and 0, 1, 2, 3, 0, 1, 2, 3 ... for vec4.

 

I am taking about alignment in runtime code. In kernel, compiler handles these properly.

0 Likes
diapolo
Adept I

ALUBusy - an easy way to raise it?

@genaganna:

I prefer to not post the kernel in here, but I could send it to you for a review. Just leave an E-Mail address.

The used Vec3 kernel parameter is an array, which consists of uints in the runtime, which works for vec2 and vec4 :-/.

Isn't any developer from AMD here, who can comment on the error I get in the KernelAnalyzer?

Dia

0 Likes
genaganna
Journeyman III

ALUBusy - an easy way to raise it?

Originally posted by: diapolo @genaganna:

 

I prefer to not post the kernel in here, but I could send it to you for a review. Just leave an E-Mail address.

 

The used Vec3 kernel parameter is an array, which consists of uints in the runtime, which works for vec2 and vec4 :-/.

 

Isn't any developer from AMD here, who can comment on the error I get in the KernelAnalyzer?

 

I am from AMD.  It looks like you filed a developer ticket and did not select appropriate field.  I did not get an email for that ticket.  Please paste ticket number here.

0 Likes