Archives Discussions

ryta1203 · ‎03-10-2009

I don't get an output when using ISA but I do when using IL. For the IL I use calclCompile and for the ISA I use calclAssembleObject, otherwise, all the code is the same. The ISA was generated using KSA.

IL:

const char ILkernel[] =

"il_ps_2_0\n"

"dcl_input_position_interp(linear) v0.x\n"

"dcl_output_generic o0\n"

"dcl_resource_id(0)_type(1d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n"

"sample_resource(0)_sampler(0) r0, v0.x\n"

"mul o0, r0, r0\n"

"ret_dyn\n"

"end\n";

ISA:

const

char

ILkernel[] =

"00 TEX: ADDR(48) CNT(1) VALID_PIX"

" 0 SAMPLE R0.x___, R0.xyxx, t0, s0 UNNORM(XYZW) "

"01 ALU: ADDR(32) CNT(2) "

" 1 x: MUL_e R0.x, R0.x, R0.x "

" y: MOV R0.y, 0.0f"

"02 EXP_DONE: PIX0, R0.xyyy"

"END_OF_PROGRAM"

There has to be something obviuos I am missing here!? Also, the KSA generated IL doesn't work either, which seems to be a problem since AMD says that you can use the KSA to generate your kernels from Brook+ and use the generated IL in CAL.

MicahVillmow · ‎03-10-2009

The ISA is incomplete.

You are missing the header and footer information that is required.

ShaderType = 3

TargetChip = w

;SC Dep components

NumClauseTemps = 4

; ----------------- CS Data ------------------------

; Input Semantic Mappings

; No input mappings

GprPoolSize = 0

CodeLen = 0;Bytes

PGM_END_CF = 0; words(64 bit)

PGM_END_ALU = 0; words(64 bit)

PGM_END_FETCH = 0; words(64 bit)

MaxScratchRegsNeeded = 0

; texResourceUsage[0] = 0x00000000

; texResourceUsage[1] = 0x00000000

; texResourceUsage[2] = 0x00000000

; texResourceUsage[3] = 0x00000000

; fetch4ResourceUsage[0] = 0x00000000

; fetch4ResourceUsage[1] = 0x00000000

; fetch4ResourceUsage[2] = 0x00000000

; fetch4ResourceUsage[3] = 0x00000000

; texSamplerUsage = 0x00000000

; constBufUsage = 0x00000000

ResourcesAffectAlphaOutput[0] = 0x00000000

ResourcesAffectAlphaOutput[1] = 0x00000000

ResourcesAffectAlphaOutput[2] = 0x00000000

ResourcesAffectAlphaOutput[3] = 0x00000000

;SQ_PGM_RESOURCES = 0x30000001

SQ_PGM_RESOURCES:NUM_GPRS = 1

SQ_PGM_RESOURCES:STACK_SIZE = 0

SQ_PRM_RESOURCES:FETCH_CACHE_LINES = 0

SQ_PRM_RESOURCESRIME_CACHE_ENABLE = 1

CsSetupMode = Fast

NumThreadPerGroup = 64

NumWavefrontPerSIMD = 16

IsMaxNumWavePerSIMD = Yes

; SetBufferForNumGroup = false

ryta1203 · ‎03-10-2009

Micah,

The generated IL from the KSA doesn't work either, the IL I was using earlier was hand generated by me, but the KSA generated version (which is much longer, unnecessarily) doesn't compile.

Also, is what you posted covered in the docs? I'd like to use CAL but I just don't see a good reason to use the IL.

Is what I am missing go in the kernel, I'm assuming not since the KSA didn't generate it? Also, some samples of CAL using the ISA would be great for a future release and a real time saver.

MicahVillmow · ‎03-10-2009

Sections 7 & 8 of R600_Assembly_Language_Format.pdf specify the header options that must precede/follow ISA code. SDK 1.3 did not have the 7XX ISA information added in it but that is being worked on for 1.4.

Also, I'd strongly discourage using the ISA as it is card specific. I.e. ISA written for HD4850/4870 is not guaranteed to run on any other board and can quite easily hang your machine. If you must write in ISA for performance reasons, I'd recommend optimizing in the higher level languages as much as possible before even attempting to use the ISA. The Shader Compiler has built in knowledge about each specific graphics card and its differences and optimizes based on that information. For example, compile the same non-trivial IL shader for a RV710 and a RV770 and you should get two drastically different ISA's.

ryta1203 · ‎03-10-2009

1. Thanks. I saw that in the ASM Lang Format doc.

2. There's a reason I want to code in ISA and it has nothing to do with specific program performance at the moment.

3. Unless there is a way to compile IL without optimizations or alterations (which I wouldn't even know about since I'd have to understand the CAL compiler totally to know this) then ISA is my only option.

4. I understand the ISA is card specific, that doesn't bother me.

5. AMD simply hasn't provided enough information for optimizations at the higher level languages to be of much use. It's almost more trial and error than anything and that's VERY time consuming.

6. Do you have any idea why IL header generated from KSA would not compile? I am also having that problem.

MicahVillmow · ‎03-10-2009

There currently is no way to disable optimizations and we are working on getting more information out to the public so that programmers can optimize their programs better.

As for SKA, it seems that it does not generate all of the header information required. I'd email them and ask them about it.

ryta1203 · ‎03-10-2009

Micah,

Thanks. You mean for the IL correct? I just want to make sure that the calclAssembleObject does no optimizations.

Thanks, I posted the request over on their forum, maybe they will read it.

ryta1203 · ‎03-10-2009

Also, the documented optimizations for Brook+ (higher level languages) work well but they only take you so far, which seems to be below expected results. Also, the optimizations don't work in all scenarios, only in some, for example there are scenarios where using float is better than float4 and vice-versa.

MicahVillmow · ‎03-10-2009

We have made a lot of performance improvements recently and writing a float4 and 4 floats should give equal performance on the hardware. The issue is more software issues as the hardware should give equivalent results for the two cases.

ryta1203 · ‎03-11-2009

FYI, I found why the copy and paste method from the SKA didn't work:

The SKA doesn't add the "\n" apparently needed at the end of each instr line. I have requested that they add this.

ryta1203 · ‎03-11-2009

So, using amuasm and amudisasm I got some ISA, but I am getting an error when calling calclAssembleObject(&obj, CAL_PROGRAM_TYPE_PS, ILkernel, info.target), the error is: "Parse errors in converting assembly program"

Here is my kernel string:

const

char

ILkernel[] =

"ShaderType = 1\n"

"TargetChip = w\n"

";SC Dep components\n"

"NumClauseTemps = 4\n"

"; -------- Disassembly --------------------\n"

"00 TEX: ADDR(48) CNT(1) VALID_PIX\n "

" 0 SAMPLE R0.x___, R0.xyxx, t0, s0 UNNORM(XYZW)\n"

"01 ALU: ADDR(32) CNT(2) \n"

" 1 x: MUL_e R0.x, R0.x, R0.x\n"

" y: MOV R0.y, 0.0f\n"

"02 EXP_DONE: PIX0, R0.xyyy\n"

"END_OF_PROGRAM\n"

"; ----------------- PS Data ------------------------\n"

"; Input Semantic Mappings\n"

"IN R0 = position0 V0.xxxx DefaultVal={0,0,0,0}\n"

"NumTexStages = 0\n"

"TexCubeMaskBits = 0x00000000\n"

"GprPoolSize = 0\n"

"CodeLen = 400;Bytes\n"

"PGM_END_CF = 0; words(64 bit)\n"

"PGM_END_ALU = 0; words(64 bit)\n"

"PGM_END_FETCH = 0; words(64 bit)\n"

"MaxScratchRegsNeeded = 0\n"

"; texResourceUsage[0] = 0x00000000\n"

"; texResourceUsage[1] = 0x00000000\n"

"; texResourceUsage[2] = 0x00000000\n"

"; texResourceUsage[3] = 0x00000000\n"

"; fetch4ResourceUsage[0] = 0x00000000\n"

"; fetch4ResourceUsage[1] = 0x00000000\n"

"; fetch4ResourceUsage[2] = 0x00000000\n"

"; fetch4ResourceUsage[3] = 0x00000000\n"

"; texSamplerUsage = 0x00000000\n"

"; constBufUsage = 0x00000000\n"

"ResourcesAffectAlphaOutput[0] = 0x00000000\n"

"ResourcesAffectAlphaOutput[1] = 0x00000000\n"

"ResourcesAffectAlphaOutput[2] = 0x00000000\n"

"ResourcesAffectAlphaOutput[3] = 0x00000000\n"

";SQ_PGM_RESOURCES = 0x72000000\n"

"SQ_PGM_RESOURCES:NUM_GPRS = 0\n"

"SQ_PGM_RESOURCES:STACK_SIZE = 0\n"

"SQ_PRM_RESOURCES:FETCH_CACHE_LINES = 2\n"

"SQ_PRM_RESOURCESRIME_CACHE_ENABLE = 1\n"

"; SPI_PS_IN_CONTROL_0 = 0x00000000\n"

"SPI0:NUM_INTERP = 0\n"

"SPI0OSITION_ENA = 0\n"

"SPI0OSITION_CENTROID = 0\n"

"SPI0OSITION_ADDR = 0\n"

"SPI0ARAM_GEN = 0\n"

"SPI0ARAM_GEN_ADDR = 0\n"

"SPI0:BARYC_SAMPLE_CNTL = 0\n"

"SPI0ERSP_GRADIENT_ENA = 0\n"

"SPI0:LINEAR_GRADIENT_ENA = 0\n"

"SPI0OSITION_SAMPLE = 0\n"

"SPI0:BARYC_SAMPLE_ENA = 0\n"

"; SPI_PS_IN_CONTROL_1 = 0x00000000\n"

"SPI1:GEN_INDEX_PIX = 0\n"

"SPI1:FIXED_PT_POSITION_ENA = 0\n"

"SPI1:FIXED_PT_POSITION_ADDR = 0\n"

"SPI1:FRONT_FACE_ENA = 0\n"

"SPI1:FRONT_FACE_ADDR = 0\n"

"SPI1:FRONT_FACE_CHAN = 0\n"

"SPI1:FOG_ADDR = 0\n"

"SPI1:GEN_INDEX_PIX_ADDR = 0\n"

"; SPI_INPUT_Z\n"

"SPIROVIDE_Z_TO_SPI = 0\n"

"; CB_SHADER_MASK = 0x00000000\n"

"CB_SHADER_CONTROL:bitmap = 00000000\n"

"; DB_SHADER_CONTROL = 0x00000200\n"

"DB:Z_EXPORT_ENABLE = 0\n"

"DB:STENCIL_REF_EXPORT_ENABLE = 0\n"

"DB:MASK_EXPORT_ENABLE = 0\n"

"DB:ALPHA_TO_MASK_DISABLE = 0\n"

"DB:Z_ORDER = 0\n"

"DB:KILL_ENABLE = 0\n"

"; SQ_PGM_EXPORTS_PS\n"

"SQ_PGM_EXPORTS_PSS_EXPORT_MODE = 0x00000000 ; (0 color)\n"

"; bHasFogMerge = 0x00000000\n"

;

MicahVillmow · ‎03-10-2009

As far as I understand calclAssembleObject does no optimizations. It directly translates to bitcode and execute on the machine.

Archives Discussions

CAL and ISA question