I have 2 il kernels. Second one differs from the first one by 2 extra dmuls. First one is working, second one gives wrong value on the .xy component of output variable.
PS. my card is ATI 5850, driver 10.12, ubuntu 9.04
KERNEL 1 ( WORKING ) il_ps_2_0 dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__ dcl_cb cb0[1] dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown) dcl_literal l0, 0x0, 0x0, 0x0, 0x0 dcl_literal l4, 0x0, 0x1, 0x0, 0x0 dcl_literal l6, 0x0, 0x3ff80000, 0x0, 0x3ff80000 dcl_literal l2, 0x1, 0x0, 0x0, 0x0 dcl_literal l3, 0x8, 0x0, 0x0, 0x0 dcl_literal l5, 0xffffffff, 0xffffffff, 0x0, 0x0 dcl_literal l1, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff mov r0.xy,vWinCoord0.xy ftou r6.xy,r0.xy mov r9.xy,r6.xy mov r10.x,l2.x mov r15.x,l3.x ishl r20.x,r9.y,r15.x iadd r24.x,r20.x,r9.x ishl r28.x,r24.x,r10.x mov r1.x,r28.x mov r3,cb[0] mov r38.xy,l4.xy mov r41.x,r1.x mov r41._y__,r1.xxxx mov r42.xy,r41.xy mov r40.xy,r42.xy iadd r47.xy,r40.xy,r38.xy ult r52.xy,r47.xy,r3.xy mov r2.xy,r52.xy sample_resource(0)_sampler(0) r57,r0.xy mov r56,r57 mov r60,r56 mov r63,r60 dldexp r65.xy__,r60.xyxy,l5.x dldexp r65.__zw,r60.zwzw,l5.y mov r68,r65_neg(yw) mov r70,r68 mov r61,r70 mov r71,r61 drsq r72.xy__,r60.xyxy drsq r72.__zw,r60.zwzw mov r75,r72 mov r62,r75 mov r76,r62 mov r77,l6 dmul r79.xy__,r62.xyxy,r62.xyxy dmul r79.__zw,r62.zwzw,r62.zwzw dmad r84.xy__,r79.xyxy,r61.xyxy,r77.xyxy dmad r84.__zw,r79.zwzw,r61.zwzw,r77.zwzw dmul r91.xy__,r62.xyxy,r84.xyxy dmul r91.__zw,r62.zwzw,r84.zwzw mov r62,r91 mov r98,l6 dmad r100.xy__,r62.xyxy,r61.xyxy,r98.xyxy dmad r100.__zw,r62.zwzw,r61.zwzw,r98.zwzw mov r62,r100 mov r107,l0 cmov_logical r109,r2.xxyy,r62,r107 mov r116,r109 mov o0.xyzw,r116 end KERNEL 2 ( invalid result on the .xy component of output ) il_ps_2_0 dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__ dcl_cb cb0[1] dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown) dcl_literal l0, 0x0, 0x0, 0x0, 0x0 dcl_literal l4, 0x0, 0x1, 0x0, 0x0 dcl_literal l6, 0x0, 0x3ff80000, 0x0, 0x3ff80000 dcl_literal l2, 0x1, 0x0, 0x0, 0x0 dcl_literal l3, 0x8, 0x0, 0x0, 0x0 dcl_literal l5, 0xffffffff, 0xffffffff, 0x0, 0x0 dcl_literal l1, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff mov r0.xy,vWinCoord0.xy ftou r6.xy,r0.xy mov r9.xy,r6.xy mov r10.x,l2.x mov r15.x,l3.x ishl r20.x,r9.y,r15.x iadd r24.x,r20.x,r9.x ishl r28.x,r24.x,r10.x mov r1.x,r28.x mov r3,cb[0] mov r38.xy,l4.xy mov r41.x,r1.x mov r41._y__,r1.xxxx mov r42.xy,r41.xy mov r40.xy,r42.xy iadd r47.xy,r40.xy,r38.xy ult r52.xy,r47.xy,r3.xy mov r2.xy,r52.xy sample_resource(0)_sampler(0) r57,r0.xy mov r56,r57 mov r60,r56 mov r63,r60 dldexp r65.xy__,r60.xyxy,l5.x dldexp r65.__zw,r60.zwzw,l5.y mov r68,r65_neg(yw) mov r70,r68 mov r61,r70 mov r71,r61 drsq r72.xy__,r60.xyxy drsq r72.__zw,r60.zwzw mov r75,r72 mov r62,r75 mov r76,r62 mov r77,l6 dmul r79.xy__,r62.xyxy,r62.xyxy dmul r79.__zw,r62.zwzw,r62.zwzw dmad r84.xy__,r79.xyxy,r61.xyxy,r77.xyxy dmad r84.__zw,r79.zwzw,r61.zwzw,r77.zwzw dmul r91.xy__,r62.xyxy,r84.xyxy dmul r91.__zw,r62.zwzw,r84.zwzw mov r62,r91 mov r98,l6 dmul r100.xy__,r62.xyxy,r62.xyxy <---- EXTRA INSTRUCTION dmul r100.__zw,r62.zwzw,r62.zwzw <---- EXTRA INSTRUCTION dmad r105.xy__,r100.xyxy,r61.xyxy,r98.xyxy dmad r105.__zw,r100.zwzw,r61.zwzw,r98.zwzw mov r62,r105 mov r112,l0 cmov_logical r114,r2.xxyy,r62,r112 mov r121,r114 mov o0.xyzw,r121 end
I'm attaching ISA generated for both kernels. On the first glimpse both kernels look "ok". They mostly differ in the layout of LDEXP computations.
ISA KERNEL 1 - WORKING ShaderType = IL_SHADER_PIXEL TargetChip = c ; ------------- SC_SRCSHADER Dump ------------------ SC_SHADERSTATE: u32NumIntVSConst = 0 SC_SHADERSTATE: u32NumIntPSConst = 0 SC_SHADERSTATE: u32NumIntGSConst = 0 SC_SHADERSTATE: u32NumBoolVSConst = 0 SC_SHADERSTATE: u32NumBoolPSConst = 0 SC_SHADERSTATE: u32NumBoolGSConst = 0 SC_SHADERSTATE: u32NumFloatVSConst = 0 SC_SHADERSTATE: u32NumFloatPSConst = 0 SC_SHADERSTATE: u32NumFloatGSConst = 0 fConstantsAvailable = 1025537139 iConstantsAvailable = 2573 bConstantsAvailable = 1634494817 u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1 ; -------- Disassembly -------------------- 00 TEX: ADDR(112) CNT(1) VALID_PIX 0 SAMPLE R1, R0.xy0x, t0, s0 UNNORM(XYZW) 01 ALU: ADDR(32) CNT(68) KCACHE0(CB0:0-15) 1 x: MOV ____, R1.x y: MOV ____, R1.y z: MOV T0.z, R1.z w: MOV T0.w, R1.w t: F_TO_U ____, R0.y 2 x: MOV T0.x, 0.0f z: MOV T1.z, 0.0f w: LSHL T1.w, PS1, (0x00000008, 1.121038771e-44f).x t: RSQ_sat_64 T0.y, PV1.y, PV1.x 3 x: LDEXP_64 T3.x, R1.y, (0xFFFFFFFF, 0.nanf).x y: LDEXP_64 T3.y, R1.x, (0xFFFFFFFF, 0.nanf).x z: LDEXP_64 T3.z, R1.w, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T3.w, R1.z, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T1.y, T0.w, T0.z 4 x: MOV T2.x, T1.z y: MOV T2.y, PS3 z: MOV T2.z, T0.x w: MOV T2.w, T0.y t: F_TO_U ____, R0.x 5 y: ADD_INT ____, PS4, T1.w w: MOV T3.w, -T3.w t: MOV T3.y, -T3.y 6 x: MUL_64 T0.x, T2.w, T2.w y: MUL_64 T0.y, T2.w, T2.w z: MUL_64 ____, T2.w, T2.w w: MUL_64 ____, T2.z, T2.z t: LSHL T1.z, PV5.y, 1 7 x: MUL_64 ____, T2.y, T2.y y: MUL_64 ____, T2.y, T2.y z: MUL_64 T0.z, T2.y, T2.y w: MUL_64 T0.w, T2.x, T2.x t: ADD_INT T1.x, PS6, 1 8 x: FMA_64 T0.x, T0.y, T3.y, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.y, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.y, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.x, 0.0f t: SETGT_UINT T1.w, KC0[0].x, T1.z 9 x: FMA_64 R123.x, T0.w, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 R123.y, T0.w, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 T0.z, T0.w, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 T0.w, T0.z, T3.z, 0.0f t: SETGT_UINT T1.z, KC0[0].y, T1.x 10 x: MUL_64 T0.x, T2.w, T0.y y: MUL_64 T0.y, T2.w, T0.y z: MUL_64 ____, T2.w, T0.y w: MUL_64 ____, T2.z, T0.x 11 x: MUL_64 ____, T2.y, T0.w y: MUL_64 ____, T2.y, T0.w z: MUL_64 T0.z, T2.y, T0.w w: MUL_64 T0.w, T2.x, T0.z 12 x: FMA_64 T0.x, T0.y, T3.y, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.y, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.y, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.x, 0.0f 13 x: FMA_64 R123.x, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 y: FMA_64 R123.y, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 z: FMA_64 R123.z, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 w: FMA_64 R123.w, T0.z, T3.z, 0.0f t: CNDE_INT R0.x, T1.w, 0.0f, PV12.x VEC_021 14 y: CNDE_INT R0.y, T1.w, 0.0f, T0.y z: CNDE_INT R0.z, T1.z, 0.0f, PV13.z w: CNDE_INT R0.w, T1.z, 0.0f, PV13.w 02 EXP_DONE: PIX0, R0 END_OF_PROGRAM ; ----------------- PS Data ------------------------ ; Input Semantic Mappings IN PARAM0 = position0 V0.xxxx DefaultVal={0,0,0,0} NumTexStages = 0 TexCubeMaskBits = 0x00000000 GprPoolSize = 0 CodeLen = 912;Bytes PGM_END_CF = 0; words(64 bit) PGM_END_ALU = 0; words(64 bit) PGM_END_FETCH = 0; words(64 bit) MaxScratchRegsNeeded = 0 ;AluPacking = 0.0 ;AluClauses = 0 ;PowerThrottleRate = 0.0 ; texResourceUsage[0] = 0x00000000 ; texResourceUsage[1] = 0x00000000 ; texResourceUsage[2] = 0x00000000 ; texResourceUsage[3] = 0x00000000 ; fetch4ResourceUsage[0] = 0x00000000 ; fetch4ResourceUsage[1] = 0x00000000 ; fetch4ResourceUsage[2] = 0x00000000 ; fetch4ResourceUsage[3] = 0x00000000 ; texSamplerUsage = 0x00000000 ; constBufUsage = 0x00000000 ResourcesAffectAlphaOutput[0] = 0x00000000 ResourcesAffectAlphaOutput[1] = 0x00000000 ResourcesAffectAlphaOutput[2] = 0x00000000 ResourcesAffectAlphaOutput[3] = 0x00000000 ;SQ_PGM_RESOURCES = 0x70000002 SQ_PGM_RESOURCES:NUM_GPRS = 2 SQ_PGM_RESOURCES:STACK_SIZE = 0 SQ_PGM_RESOURCES:PRIME_CACHE_ENABLE = 1 ;SQ_PGM_RESOURCES_2 = 0x000000C0 SQ_LDS_ALLOC_PS:SIZE = 0x00000000 ; SPI_PS_IN_CONTROL_0 = 0x00000100 SPI0:NUM_INTERP = 0 SPI0:POSITION_ENA = 1 SPI0:POSITION_CENTROID = 0 SPI0:POSITION_ADDR = 0 SPI0:PARAM_GEN = 0 SPI0:PERSP_GRADIENT_ENA = 0 SPI0:LINEAR_GRADIENT_ENA = 0 SPI0:POSITION_SAMPLE = 0 ; SPI_PS_IN_CONTROL_1 = 0x00000000 SPI1:GEN_INDEX_PIX = 0 SPI1:FIXED_PT_POSITION_ENA = 0 SPI1:FIXED_PT_POSITION_ADDR = 0 SPI1:FRONT_FACE_ENA = 0 SPI1:FRONT_FACE_ADDR = 0 SPI1:FOG_ADDR = 0 ; SPI_PS_IN_CONTROL_2 = 0x00000000 SPI2:LINE_STIPPLE_TEX_ENA = 0 SPI2:LINE_STIPPLE_TEX_ADDR = 0 ; SPI_BARYC_CNTL = 0x00000001 SPI_BARYC_CNTL:PERSP_CENTER_ENA = 1 SPI_BARYC_CNTL:PERSP_CENTROID_ENA = 0 SPI_BARYC_CNTL:PERSP_SAMPLE_ENA = 0 SPI_BARYC_CNTL:PERSP_PULL_MODEL_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTER_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTROID_ENA = 0 SPI_BARYC_CNTL:LINEAR_SAMPLE_ENA = 0 ; SPI_INPUT_Z SPI:PROVIDE_Z_TO_SPI = 0 ; CB_SHADER_MASK = 0x0000000F CB:OUTPUT0_ENABLE = 15 ; DB_SHADER_CONTROL = 0x00004210 DB:Z_EXPORT_ENABLE = 0 DB:STENCIL_REF_EXPORT_ENABLE = 0 DB:MASK_EXPORT_ENABLE = 0 DB:ALPHA_TO_MASK_DISABLE = 0 DB:Z_ORDER = 1 DB:KILL_ENABLE = 0 DB:DB_SOURCE_FORMAT = 2 DB:CONSERVATIVE_Z_EXPORT = 0 DB:DEPTH_BEFORE_SHADER = 0 DB:EXEC_ON_HIER_FAIL = 0 ; SQ_PGM_EXPORTS_PS SQ_PGM_EXPORTS_PS:PS_EXPORT_MODE = 0x00000002 ; (1 color) ISA KERNEL 2 - NOT WORKING ShaderType = IL_SHADER_PIXEL TargetChip = c ; ------------- SC_SRCSHADER Dump ------------------ SC_SHADERSTATE: u32NumIntVSConst = 0 SC_SHADERSTATE: u32NumIntPSConst = 0 SC_SHADERSTATE: u32NumIntGSConst = 0 SC_SHADERSTATE: u32NumBoolVSConst = 0 SC_SHADERSTATE: u32NumBoolPSConst = 0 SC_SHADERSTATE: u32NumBoolGSConst = 0 SC_SHADERSTATE: u32NumFloatVSConst = 0 SC_SHADERSTATE: u32NumFloatPSConst = 0 SC_SHADERSTATE: u32NumFloatGSConst = 0 fConstantsAvailable = 1025537139 iConstantsAvailable = 2573 bConstantsAvailable = 1634494817 u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1 ; -------- Disassembly -------------------- 00 TEX: ADDR(112) CNT(1) VALID_PIX 0 SAMPLE R1, R0.xy0x, t0, s0 UNNORM(XYZW) 01 ALU: ADDR(32) CNT(77) KCACHE0(CB0:0-15) 1 x: MOV ____, R1.x y: MOV ____, R1.y z: MOV T0.z, R1.z w: MOV T0.w, R1.w t: MOV T0.x, 0.0f 2 x: MOV T1.x, 0.0f z: LDEXP_64 T3.z, R1.y, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T3.w, R1.x, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T0.y, PV1.y, PV1.x 3 x: MOV T0.x, T0.x y: MOV T0.y, PS2 z: LDEXP_64 T2.z, R1.w, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T2.w, R1.z, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T1.y, T0.w, T0.z 4 x: MOV T2.x, T1.x y: MOV T2.y, PS3 w: MOV T2.w, -PV3.w t: MOV T3.y, -T3.w 5 x: MUL_64 T1.x, T0.y, T0.y y: MUL_64 T1.y, T0.y, T0.y z: MUL_64 ____, T0.y, T0.y w: MUL_64 ____, T0.x, T0.x t: F_TO_U ____, R0.y 6 w: LSHL ____, PS5, (0x00000008, 1.121038771e-44f).x t: F_TO_U ____, R0.x 7 x: MUL_64 ____, T2.y, T2.y y: MUL_64 ____, T2.y, T2.y z: MUL_64 T0.z, T2.y, T2.y w: MUL_64 T0.w, T2.x, T2.x t: ADD_INT ____, PS6, PV6.w 8 x: FMA_64 T1.x, T1.y, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 T1.y, T1.y, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T1.y, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T1.x, T3.z, 0.0f t: LSHL T1.z, PS7, 1 9 x: FMA_64 R123.x, T0.w, T2.w, (0x3FF80000, 1.9375f).x y: FMA_64 R123.y, T0.w, T2.w, (0x3FF80000, 1.9375f).x z: FMA_64 T0.z, T0.w, T2.w, (0x3FF80000, 1.9375f).x w: FMA_64 T0.w, T0.z, T2.z, 0.0f t: ADD_INT T3.x, PS8, 1 10 x: MUL_64 T0.x, T0.y, T1.y y: MUL_64 T0.y, T0.y, T1.y z: MUL_64 ____, T0.y, T1.y w: MUL_64 ____, T0.x, T1.x t: SETGT_UINT T1.w, KC0[0].x, T1.z 11 x: MUL_64 ____, T2.y, T0.w y: MUL_64 ____, T2.y, T0.w z: MUL_64 T0.z, T2.y, T0.w w: MUL_64 T0.w, T2.x, T0.z t: SETGT_UINT T1.z, KC0[0].y, T3.x 12 x: MUL_64 T0.x, T0.y, T0.y y: MUL_64 T0.y, T0.y, T0.y z: MUL_64 ____, T0.y, T0.y w: MUL_64 ____, T0.x, T0.x 13 x: MUL_64 ____, T0.w, T0.w y: MUL_64 ____, T0.w, T0.w z: MUL_64 T0.z, T0.w, T0.w w: MUL_64 T0.w, T0.z, T0.z 14 x: FMA_64 T0.x, T0.y, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.z, 0.0f 15 x: FMA_64 R123.x, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 y: FMA_64 R123.y, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 z: FMA_64 R123.z, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 w: FMA_64 R123.w, T0.z, T2.z, 0.0f t: CNDE_INT R0.x, T1.w, 0.0f, PV14.x VEC_021 16 y: CNDE_INT R0.y, T1.w, 0.0f, T0.y z: CNDE_INT R0.z, T1.z, 0.0f, PV15.z w: CNDE_INT R0.w, T1.z, 0.0f, PV15.w 02 EXP_DONE: PIX0, R0 END_OF_PROGRAM ; ----------------- PS Data ------------------------ ; Input Semantic Mappings IN PARAM0 = position0 V0.xxxx DefaultVal={0,0,0,0} NumTexStages = 0 TexCubeMaskBits = 0x00000000 GprPoolSize = 0 CodeLen = 912;Bytes PGM_END_CF = 0; words(64 bit) PGM_END_ALU = 0; words(64 bit) PGM_END_FETCH = 0; words(64 bit) MaxScratchRegsNeeded = 0 ;AluPacking = 0.0 ;AluClauses = 0 ;PowerThrottleRate = 0.0 ; texResourceUsage[0] = 0x00000000 ; texResourceUsage[1] = 0x00000000 ; texResourceUsage[2] = 0x00000000 ; texResourceUsage[3] = 0x00000000 ; fetch4ResourceUsage[0] = 0x00000000 ; fetch4ResourceUsage[1] = 0x00000000 ; fetch4ResourceUsage[2] = 0x00000000 ; fetch4ResourceUsage[3] = 0x00000000 ; texSamplerUsage = 0x00000000 ; constBufUsage = 0x00000000 ResourcesAffectAlphaOutput[0] = 0x00000000 ResourcesAffectAlphaOutput[1] = 0x00000000 ResourcesAffectAlphaOutput[2] = 0x00000000 ResourcesAffectAlphaOutput[3] = 0x00000000 ;SQ_PGM_RESOURCES = 0x70000002 SQ_PGM_RESOURCES:NUM_GPRS = 2 SQ_PGM_RESOURCES:STACK_SIZE = 0 SQ_PGM_RESOURCES:PRIME_CACHE_ENABLE = 1 ;SQ_PGM_RESOURCES_2 = 0x000000C0 SQ_LDS_ALLOC_PS:SIZE = 0x00000000 ; SPI_PS_IN_CONTROL_0 = 0x00000100 SPI0:NUM_INTERP = 0 SPI0:POSITION_ENA = 1 SPI0:POSITION_CENTROID = 0 SPI0:POSITION_ADDR = 0 SPI0:PARAM_GEN = 0 SPI0:PERSP_GRADIENT_ENA = 0 SPI0:LINEAR_GRADIENT_ENA = 0 SPI0:POSITION_SAMPLE = 0 ; SPI_PS_IN_CONTROL_1 = 0x00000000 SPI1:GEN_INDEX_PIX = 0 SPI1:FIXED_PT_POSITION_ENA = 0 SPI1:FIXED_PT_POSITION_ADDR = 0 SPI1:FRONT_FACE_ENA = 0 SPI1:FRONT_FACE_ADDR = 0 SPI1:FOG_ADDR = 0 ; SPI_PS_IN_CONTROL_2 = 0x00000000 SPI2:LINE_STIPPLE_TEX_ENA = 0 SPI2:LINE_STIPPLE_TEX_ADDR = 0 ; SPI_BARYC_CNTL = 0x00000001 SPI_BARYC_CNTL:PERSP_CENTER_ENA = 1 SPI_BARYC_CNTL:PERSP_CENTROID_ENA = 0 SPI_BARYC_CNTL:PERSP_SAMPLE_ENA = 0 SPI_BARYC_CNTL:PERSP_PULL_MODEL_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTER_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTROID_ENA = 0 SPI_BARYC_CNTL:LINEAR_SAMPLE_ENA = 0 ; SPI_INPUT_Z SPI:PROVIDE_Z_TO_SPI = 0 ; CB_SHADER_MASK = 0x0000000F CB:OUTPUT0_ENABLE = 15 ; DB_SHADER_CONTROL = 0x00004210 DB:Z_EXPORT_ENABLE = 0 DB:STENCIL_REF_EXPORT_ENABLE = 0 DB:MASK_EXPORT_ENABLE = 0 DB:ALPHA_TO_MASK_DISABLE = 0 DB:Z_ORDER = 1 DB:KILL_ENABLE = 0 DB:DB_SOURCE_FORMAT = 2 DB:CONSERVATIVE_Z_EXPORT = 0 DB:DEPTH_BEFORE_SHADER = 0 DB:EXEC_ON_HIER_FAIL = 0 ; SQ_PGM_EXPORTS_PS SQ_PGM_EXPORTS_PS:PS_EXPORT_MODE = 0x00000002 ; (1 color)
Hazeman ,
Can you provide the kernel code?
You can also send a test case a streamdeveloper@amd.com
Code is available as part of CAL++ library ( download here ). The file with kernel is regression/rsqrt.cpp .
After some tests I've noticed that kernels with LDEXP in the same ISA instruction give correct value and kernels where LDEXP is splitted to 2 different instructions give wrong value in o0.xy . But of course it can be only a coincidence.
Driver 11.1 also gives wrong results.