cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

hazeman
Adept II

IL compiler/assembler bug ?

I have 2 il kernels. Second one differs from the first one by 2 extra dmuls. First one is working, second one gives wrong value on the .xy component of output variable.

PS. my card is ATI 5850, driver 10.12, ubuntu 9.04

 

 

KERNEL 1 ( WORKING ) il_ps_2_0 dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__ dcl_cb cb0[1] dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown) dcl_literal l0, 0x0, 0x0, 0x0, 0x0 dcl_literal l4, 0x0, 0x1, 0x0, 0x0 dcl_literal l6, 0x0, 0x3ff80000, 0x0, 0x3ff80000 dcl_literal l2, 0x1, 0x0, 0x0, 0x0 dcl_literal l3, 0x8, 0x0, 0x0, 0x0 dcl_literal l5, 0xffffffff, 0xffffffff, 0x0, 0x0 dcl_literal l1, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff mov r0.xy,vWinCoord0.xy ftou r6.xy,r0.xy mov r9.xy,r6.xy mov r10.x,l2.x mov r15.x,l3.x ishl r20.x,r9.y,r15.x iadd r24.x,r20.x,r9.x ishl r28.x,r24.x,r10.x mov r1.x,r28.x mov r3,cb[0] mov r38.xy,l4.xy mov r41.x,r1.x mov r41._y__,r1.xxxx mov r42.xy,r41.xy mov r40.xy,r42.xy iadd r47.xy,r40.xy,r38.xy ult r52.xy,r47.xy,r3.xy mov r2.xy,r52.xy sample_resource(0)_sampler(0) r57,r0.xy mov r56,r57 mov r60,r56 mov r63,r60 dldexp r65.xy__,r60.xyxy,l5.x dldexp r65.__zw,r60.zwzw,l5.y mov r68,r65_neg(yw) mov r70,r68 mov r61,r70 mov r71,r61 drsq r72.xy__,r60.xyxy drsq r72.__zw,r60.zwzw mov r75,r72 mov r62,r75 mov r76,r62 mov r77,l6 dmul r79.xy__,r62.xyxy,r62.xyxy dmul r79.__zw,r62.zwzw,r62.zwzw dmad r84.xy__,r79.xyxy,r61.xyxy,r77.xyxy dmad r84.__zw,r79.zwzw,r61.zwzw,r77.zwzw dmul r91.xy__,r62.xyxy,r84.xyxy dmul r91.__zw,r62.zwzw,r84.zwzw mov r62,r91 mov r98,l6 dmad r100.xy__,r62.xyxy,r61.xyxy,r98.xyxy dmad r100.__zw,r62.zwzw,r61.zwzw,r98.zwzw mov r62,r100 mov r107,l0 cmov_logical r109,r2.xxyy,r62,r107 mov r116,r109 mov o0.xyzw,r116 end KERNEL 2 ( invalid result on the .xy component of output ) il_ps_2_0 dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__ dcl_cb cb0[1] dcl_output_generic o0 dcl_resource_id(0)_type(2d,unnorm)_fmtx(unknown)_fmty(unknown)_fmtz(unknown)_fmtw(unknown) dcl_literal l0, 0x0, 0x0, 0x0, 0x0 dcl_literal l4, 0x0, 0x1, 0x0, 0x0 dcl_literal l6, 0x0, 0x3ff80000, 0x0, 0x3ff80000 dcl_literal l2, 0x1, 0x0, 0x0, 0x0 dcl_literal l3, 0x8, 0x0, 0x0, 0x0 dcl_literal l5, 0xffffffff, 0xffffffff, 0x0, 0x0 dcl_literal l1, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff mov r0.xy,vWinCoord0.xy ftou r6.xy,r0.xy mov r9.xy,r6.xy mov r10.x,l2.x mov r15.x,l3.x ishl r20.x,r9.y,r15.x iadd r24.x,r20.x,r9.x ishl r28.x,r24.x,r10.x mov r1.x,r28.x mov r3,cb[0] mov r38.xy,l4.xy mov r41.x,r1.x mov r41._y__,r1.xxxx mov r42.xy,r41.xy mov r40.xy,r42.xy iadd r47.xy,r40.xy,r38.xy ult r52.xy,r47.xy,r3.xy mov r2.xy,r52.xy sample_resource(0)_sampler(0) r57,r0.xy mov r56,r57 mov r60,r56 mov r63,r60 dldexp r65.xy__,r60.xyxy,l5.x dldexp r65.__zw,r60.zwzw,l5.y mov r68,r65_neg(yw) mov r70,r68 mov r61,r70 mov r71,r61 drsq r72.xy__,r60.xyxy drsq r72.__zw,r60.zwzw mov r75,r72 mov r62,r75 mov r76,r62 mov r77,l6 dmul r79.xy__,r62.xyxy,r62.xyxy dmul r79.__zw,r62.zwzw,r62.zwzw dmad r84.xy__,r79.xyxy,r61.xyxy,r77.xyxy dmad r84.__zw,r79.zwzw,r61.zwzw,r77.zwzw dmul r91.xy__,r62.xyxy,r84.xyxy dmul r91.__zw,r62.zwzw,r84.zwzw mov r62,r91 mov r98,l6 dmul r100.xy__,r62.xyxy,r62.xyxy <---- EXTRA INSTRUCTION dmul r100.__zw,r62.zwzw,r62.zwzw <---- EXTRA INSTRUCTION dmad r105.xy__,r100.xyxy,r61.xyxy,r98.xyxy dmad r105.__zw,r100.zwzw,r61.zwzw,r98.zwzw mov r62,r105 mov r112,l0 cmov_logical r114,r2.xxyy,r62,r112 mov r121,r114 mov o0.xyzw,r121 end

0 Likes
5 Replies
hazeman
Adept II

I'm attaching ISA generated for both kernels. On the first glimpse both kernels look "ok". They mostly differ in the layout of LDEXP computations.

 

ISA KERNEL 1 - WORKING ShaderType = IL_SHADER_PIXEL TargetChip = c ; ------------- SC_SRCSHADER Dump ------------------ SC_SHADERSTATE: u32NumIntVSConst = 0 SC_SHADERSTATE: u32NumIntPSConst = 0 SC_SHADERSTATE: u32NumIntGSConst = 0 SC_SHADERSTATE: u32NumBoolVSConst = 0 SC_SHADERSTATE: u32NumBoolPSConst = 0 SC_SHADERSTATE: u32NumBoolGSConst = 0 SC_SHADERSTATE: u32NumFloatVSConst = 0 SC_SHADERSTATE: u32NumFloatPSConst = 0 SC_SHADERSTATE: u32NumFloatGSConst = 0 fConstantsAvailable = 1025537139 iConstantsAvailable = 2573 bConstantsAvailable = 1634494817 u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1 ; -------- Disassembly -------------------- 00 TEX: ADDR(112) CNT(1) VALID_PIX 0 SAMPLE R1, R0.xy0x, t0, s0 UNNORM(XYZW) 01 ALU: ADDR(32) CNT(68) KCACHE0(CB0:0-15) 1 x: MOV ____, R1.x y: MOV ____, R1.y z: MOV T0.z, R1.z w: MOV T0.w, R1.w t: F_TO_U ____, R0.y 2 x: MOV T0.x, 0.0f z: MOV T1.z, 0.0f w: LSHL T1.w, PS1, (0x00000008, 1.121038771e-44f).x t: RSQ_sat_64 T0.y, PV1.y, PV1.x 3 x: LDEXP_64 T3.x, R1.y, (0xFFFFFFFF, 0.nanf).x y: LDEXP_64 T3.y, R1.x, (0xFFFFFFFF, 0.nanf).x z: LDEXP_64 T3.z, R1.w, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T3.w, R1.z, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T1.y, T0.w, T0.z 4 x: MOV T2.x, T1.z y: MOV T2.y, PS3 z: MOV T2.z, T0.x w: MOV T2.w, T0.y t: F_TO_U ____, R0.x 5 y: ADD_INT ____, PS4, T1.w w: MOV T3.w, -T3.w t: MOV T3.y, -T3.y 6 x: MUL_64 T0.x, T2.w, T2.w y: MUL_64 T0.y, T2.w, T2.w z: MUL_64 ____, T2.w, T2.w w: MUL_64 ____, T2.z, T2.z t: LSHL T1.z, PV5.y, 1 7 x: MUL_64 ____, T2.y, T2.y y: MUL_64 ____, T2.y, T2.y z: MUL_64 T0.z, T2.y, T2.y w: MUL_64 T0.w, T2.x, T2.x t: ADD_INT T1.x, PS6, 1 8 x: FMA_64 T0.x, T0.y, T3.y, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.y, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.y, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.x, 0.0f t: SETGT_UINT T1.w, KC0[0].x, T1.z 9 x: FMA_64 R123.x, T0.w, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 R123.y, T0.w, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 T0.z, T0.w, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 T0.w, T0.z, T3.z, 0.0f t: SETGT_UINT T1.z, KC0[0].y, T1.x 10 x: MUL_64 T0.x, T2.w, T0.y y: MUL_64 T0.y, T2.w, T0.y z: MUL_64 ____, T2.w, T0.y w: MUL_64 ____, T2.z, T0.x 11 x: MUL_64 ____, T2.y, T0.w y: MUL_64 ____, T2.y, T0.w z: MUL_64 T0.z, T2.y, T0.w w: MUL_64 T0.w, T2.x, T0.z 12 x: FMA_64 T0.x, T0.y, T3.y, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.y, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.y, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.x, 0.0f 13 x: FMA_64 R123.x, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 y: FMA_64 R123.y, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 z: FMA_64 R123.z, T0.w, T3.w, (0x3FF80000, 1.9375f).x VEC_021 w: FMA_64 R123.w, T0.z, T3.z, 0.0f t: CNDE_INT R0.x, T1.w, 0.0f, PV12.x VEC_021 14 y: CNDE_INT R0.y, T1.w, 0.0f, T0.y z: CNDE_INT R0.z, T1.z, 0.0f, PV13.z w: CNDE_INT R0.w, T1.z, 0.0f, PV13.w 02 EXP_DONE: PIX0, R0 END_OF_PROGRAM ; ----------------- PS Data ------------------------ ; Input Semantic Mappings IN PARAM0 = position0 V0.xxxx DefaultVal={0,0,0,0} NumTexStages = 0 TexCubeMaskBits = 0x00000000 GprPoolSize = 0 CodeLen = 912;Bytes PGM_END_CF = 0; words(64 bit) PGM_END_ALU = 0; words(64 bit) PGM_END_FETCH = 0; words(64 bit) MaxScratchRegsNeeded = 0 ;AluPacking = 0.0 ;AluClauses = 0 ;PowerThrottleRate = 0.0 ; texResourceUsage[0] = 0x00000000 ; texResourceUsage[1] = 0x00000000 ; texResourceUsage[2] = 0x00000000 ; texResourceUsage[3] = 0x00000000 ; fetch4ResourceUsage[0] = 0x00000000 ; fetch4ResourceUsage[1] = 0x00000000 ; fetch4ResourceUsage[2] = 0x00000000 ; fetch4ResourceUsage[3] = 0x00000000 ; texSamplerUsage = 0x00000000 ; constBufUsage = 0x00000000 ResourcesAffectAlphaOutput[0] = 0x00000000 ResourcesAffectAlphaOutput[1] = 0x00000000 ResourcesAffectAlphaOutput[2] = 0x00000000 ResourcesAffectAlphaOutput[3] = 0x00000000 ;SQ_PGM_RESOURCES = 0x70000002 SQ_PGM_RESOURCES:NUM_GPRS = 2 SQ_PGM_RESOURCES:STACK_SIZE = 0 SQ_PGM_RESOURCES:PRIME_CACHE_ENABLE = 1 ;SQ_PGM_RESOURCES_2 = 0x000000C0 SQ_LDS_ALLOC_PS:SIZE = 0x00000000 ; SPI_PS_IN_CONTROL_0 = 0x00000100 SPI0:NUM_INTERP = 0 SPI0:POSITION_ENA = 1 SPI0:POSITION_CENTROID = 0 SPI0:POSITION_ADDR = 0 SPI0:PARAM_GEN = 0 SPI0:PERSP_GRADIENT_ENA = 0 SPI0:LINEAR_GRADIENT_ENA = 0 SPI0:POSITION_SAMPLE = 0 ; SPI_PS_IN_CONTROL_1 = 0x00000000 SPI1:GEN_INDEX_PIX = 0 SPI1:FIXED_PT_POSITION_ENA = 0 SPI1:FIXED_PT_POSITION_ADDR = 0 SPI1:FRONT_FACE_ENA = 0 SPI1:FRONT_FACE_ADDR = 0 SPI1:FOG_ADDR = 0 ; SPI_PS_IN_CONTROL_2 = 0x00000000 SPI2:LINE_STIPPLE_TEX_ENA = 0 SPI2:LINE_STIPPLE_TEX_ADDR = 0 ; SPI_BARYC_CNTL = 0x00000001 SPI_BARYC_CNTL:PERSP_CENTER_ENA = 1 SPI_BARYC_CNTL:PERSP_CENTROID_ENA = 0 SPI_BARYC_CNTL:PERSP_SAMPLE_ENA = 0 SPI_BARYC_CNTL:PERSP_PULL_MODEL_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTER_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTROID_ENA = 0 SPI_BARYC_CNTL:LINEAR_SAMPLE_ENA = 0 ; SPI_INPUT_Z SPI:PROVIDE_Z_TO_SPI = 0 ; CB_SHADER_MASK = 0x0000000F CB:OUTPUT0_ENABLE = 15 ; DB_SHADER_CONTROL = 0x00004210 DB:Z_EXPORT_ENABLE = 0 DB:STENCIL_REF_EXPORT_ENABLE = 0 DB:MASK_EXPORT_ENABLE = 0 DB:ALPHA_TO_MASK_DISABLE = 0 DB:Z_ORDER = 1 DB:KILL_ENABLE = 0 DB:DB_SOURCE_FORMAT = 2 DB:CONSERVATIVE_Z_EXPORT = 0 DB:DEPTH_BEFORE_SHADER = 0 DB:EXEC_ON_HIER_FAIL = 0 ; SQ_PGM_EXPORTS_PS SQ_PGM_EXPORTS_PS:PS_EXPORT_MODE = 0x00000002 ; (1 color) ISA KERNEL 2 - NOT WORKING ShaderType = IL_SHADER_PIXEL TargetChip = c ; ------------- SC_SRCSHADER Dump ------------------ SC_SHADERSTATE: u32NumIntVSConst = 0 SC_SHADERSTATE: u32NumIntPSConst = 0 SC_SHADERSTATE: u32NumIntGSConst = 0 SC_SHADERSTATE: u32NumBoolVSConst = 0 SC_SHADERSTATE: u32NumBoolPSConst = 0 SC_SHADERSTATE: u32NumBoolGSConst = 0 SC_SHADERSTATE: u32NumFloatVSConst = 0 SC_SHADERSTATE: u32NumFloatPSConst = 0 SC_SHADERSTATE: u32NumFloatGSConst = 0 fConstantsAvailable = 1025537139 iConstantsAvailable = 2573 bConstantsAvailable = 1634494817 u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC u32SCOptions[1] = 0x00202000 SCOption_R600_ERROR_ON_DOUBLE_MEMEXP SCOption_SET_VPM_FOR_SCATTER u32SCOptions[2] = 0x00000040 SCOption_R800_UAV_NONUAV_SYNC_WORKAROUND_BUG216513_1 ; -------- Disassembly -------------------- 00 TEX: ADDR(112) CNT(1) VALID_PIX 0 SAMPLE R1, R0.xy0x, t0, s0 UNNORM(XYZW) 01 ALU: ADDR(32) CNT(77) KCACHE0(CB0:0-15) 1 x: MOV ____, R1.x y: MOV ____, R1.y z: MOV T0.z, R1.z w: MOV T0.w, R1.w t: MOV T0.x, 0.0f 2 x: MOV T1.x, 0.0f z: LDEXP_64 T3.z, R1.y, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T3.w, R1.x, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T0.y, PV1.y, PV1.x 3 x: MOV T0.x, T0.x y: MOV T0.y, PS2 z: LDEXP_64 T2.z, R1.w, (0xFFFFFFFF, 0.nanf).x w: LDEXP_64 T2.w, R1.z, (0xFFFFFFFF, 0.nanf).x t: RSQ_sat_64 T1.y, T0.w, T0.z 4 x: MOV T2.x, T1.x y: MOV T2.y, PS3 w: MOV T2.w, -PV3.w t: MOV T3.y, -T3.w 5 x: MUL_64 T1.x, T0.y, T0.y y: MUL_64 T1.y, T0.y, T0.y z: MUL_64 ____, T0.y, T0.y w: MUL_64 ____, T0.x, T0.x t: F_TO_U ____, R0.y 6 w: LSHL ____, PS5, (0x00000008, 1.121038771e-44f).x t: F_TO_U ____, R0.x 7 x: MUL_64 ____, T2.y, T2.y y: MUL_64 ____, T2.y, T2.y z: MUL_64 T0.z, T2.y, T2.y w: MUL_64 T0.w, T2.x, T2.x t: ADD_INT ____, PS6, PV6.w 8 x: FMA_64 T1.x, T1.y, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 T1.y, T1.y, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T1.y, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T1.x, T3.z, 0.0f t: LSHL T1.z, PS7, 1 9 x: FMA_64 R123.x, T0.w, T2.w, (0x3FF80000, 1.9375f).x y: FMA_64 R123.y, T0.w, T2.w, (0x3FF80000, 1.9375f).x z: FMA_64 T0.z, T0.w, T2.w, (0x3FF80000, 1.9375f).x w: FMA_64 T0.w, T0.z, T2.z, 0.0f t: ADD_INT T3.x, PS8, 1 10 x: MUL_64 T0.x, T0.y, T1.y y: MUL_64 T0.y, T0.y, T1.y z: MUL_64 ____, T0.y, T1.y w: MUL_64 ____, T0.x, T1.x t: SETGT_UINT T1.w, KC0[0].x, T1.z 11 x: MUL_64 ____, T2.y, T0.w y: MUL_64 ____, T2.y, T0.w z: MUL_64 T0.z, T2.y, T0.w w: MUL_64 T0.w, T2.x, T0.z t: SETGT_UINT T1.z, KC0[0].y, T3.x 12 x: MUL_64 T0.x, T0.y, T0.y y: MUL_64 T0.y, T0.y, T0.y z: MUL_64 ____, T0.y, T0.y w: MUL_64 ____, T0.x, T0.x 13 x: MUL_64 ____, T0.w, T0.w y: MUL_64 ____, T0.w, T0.w z: MUL_64 T0.z, T0.w, T0.w w: MUL_64 T0.w, T0.z, T0.z 14 x: FMA_64 T0.x, T0.y, T3.w, (0x3FF80000, 1.9375f).x y: FMA_64 T0.y, T0.y, T3.w, (0x3FF80000, 1.9375f).x z: FMA_64 R123.z, T0.y, T3.w, (0x3FF80000, 1.9375f).x w: FMA_64 R123.w, T0.x, T3.z, 0.0f 15 x: FMA_64 R123.x, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 y: FMA_64 R123.y, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 z: FMA_64 R123.z, T0.w, T2.w, (0x3FF80000, 1.9375f).x VEC_021 w: FMA_64 R123.w, T0.z, T2.z, 0.0f t: CNDE_INT R0.x, T1.w, 0.0f, PV14.x VEC_021 16 y: CNDE_INT R0.y, T1.w, 0.0f, T0.y z: CNDE_INT R0.z, T1.z, 0.0f, PV15.z w: CNDE_INT R0.w, T1.z, 0.0f, PV15.w 02 EXP_DONE: PIX0, R0 END_OF_PROGRAM ; ----------------- PS Data ------------------------ ; Input Semantic Mappings IN PARAM0 = position0 V0.xxxx DefaultVal={0,0,0,0} NumTexStages = 0 TexCubeMaskBits = 0x00000000 GprPoolSize = 0 CodeLen = 912;Bytes PGM_END_CF = 0; words(64 bit) PGM_END_ALU = 0; words(64 bit) PGM_END_FETCH = 0; words(64 bit) MaxScratchRegsNeeded = 0 ;AluPacking = 0.0 ;AluClauses = 0 ;PowerThrottleRate = 0.0 ; texResourceUsage[0] = 0x00000000 ; texResourceUsage[1] = 0x00000000 ; texResourceUsage[2] = 0x00000000 ; texResourceUsage[3] = 0x00000000 ; fetch4ResourceUsage[0] = 0x00000000 ; fetch4ResourceUsage[1] = 0x00000000 ; fetch4ResourceUsage[2] = 0x00000000 ; fetch4ResourceUsage[3] = 0x00000000 ; texSamplerUsage = 0x00000000 ; constBufUsage = 0x00000000 ResourcesAffectAlphaOutput[0] = 0x00000000 ResourcesAffectAlphaOutput[1] = 0x00000000 ResourcesAffectAlphaOutput[2] = 0x00000000 ResourcesAffectAlphaOutput[3] = 0x00000000 ;SQ_PGM_RESOURCES = 0x70000002 SQ_PGM_RESOURCES:NUM_GPRS = 2 SQ_PGM_RESOURCES:STACK_SIZE = 0 SQ_PGM_RESOURCES:PRIME_CACHE_ENABLE = 1 ;SQ_PGM_RESOURCES_2 = 0x000000C0 SQ_LDS_ALLOC_PS:SIZE = 0x00000000 ; SPI_PS_IN_CONTROL_0 = 0x00000100 SPI0:NUM_INTERP = 0 SPI0:POSITION_ENA = 1 SPI0:POSITION_CENTROID = 0 SPI0:POSITION_ADDR = 0 SPI0:PARAM_GEN = 0 SPI0:PERSP_GRADIENT_ENA = 0 SPI0:LINEAR_GRADIENT_ENA = 0 SPI0:POSITION_SAMPLE = 0 ; SPI_PS_IN_CONTROL_1 = 0x00000000 SPI1:GEN_INDEX_PIX = 0 SPI1:FIXED_PT_POSITION_ENA = 0 SPI1:FIXED_PT_POSITION_ADDR = 0 SPI1:FRONT_FACE_ENA = 0 SPI1:FRONT_FACE_ADDR = 0 SPI1:FOG_ADDR = 0 ; SPI_PS_IN_CONTROL_2 = 0x00000000 SPI2:LINE_STIPPLE_TEX_ENA = 0 SPI2:LINE_STIPPLE_TEX_ADDR = 0 ; SPI_BARYC_CNTL = 0x00000001 SPI_BARYC_CNTL:PERSP_CENTER_ENA = 1 SPI_BARYC_CNTL:PERSP_CENTROID_ENA = 0 SPI_BARYC_CNTL:PERSP_SAMPLE_ENA = 0 SPI_BARYC_CNTL:PERSP_PULL_MODEL_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTER_ENA = 0 SPI_BARYC_CNTL:LINEAR_CENTROID_ENA = 0 SPI_BARYC_CNTL:LINEAR_SAMPLE_ENA = 0 ; SPI_INPUT_Z SPI:PROVIDE_Z_TO_SPI = 0 ; CB_SHADER_MASK = 0x0000000F CB:OUTPUT0_ENABLE = 15 ; DB_SHADER_CONTROL = 0x00004210 DB:Z_EXPORT_ENABLE = 0 DB:STENCIL_REF_EXPORT_ENABLE = 0 DB:MASK_EXPORT_ENABLE = 0 DB:ALPHA_TO_MASK_DISABLE = 0 DB:Z_ORDER = 1 DB:KILL_ENABLE = 0 DB:DB_SOURCE_FORMAT = 2 DB:CONSERVATIVE_Z_EXPORT = 0 DB:DEPTH_BEFORE_SHADER = 0 DB:EXEC_ON_HIER_FAIL = 0 ; SQ_PGM_EXPORTS_PS SQ_PGM_EXPORTS_PS:PS_EXPORT_MODE = 0x00000002 ; (1 color)

0 Likes

Hazeman ,

Can you provide the kernel code?

You can also send a test case a streamdeveloper@amd.com

0 Likes

Code is available as part of CAL++ library ( download here ). The file with kernel is regression/rsqrt.cpp .

After some tests I've noticed that kernels with LDEXP in the same ISA instruction give correct value and kernels where LDEXP is splitted to 2 different instructions give wrong value in o0.xy . But of course it can be only a coincidence.

 

 

 

0 Likes

Driver 11.1 also gives wrong results.

0 Likes

hazeman,
Our driver release process is about 3 months from development to release. So I would not expect any fixes until March/April time frame in the base driver.
0 Likes