2 Replies Latest reply on May 20, 2012 6:26 AM by yurtesen

    Some questions regarding AMDs OpenCL implementation on linux

    sifff

      1.)

      Is there a time limit for the maximum execution time of a single kernel?

      2.)

      Is there a time limit when the fglrx driver thinks the GPU is hung?

      I get these hangs, If running my kernel for about 3minutes:

       

      [ 910.658628] [fglrx] ASIC hang happened [ 910.658635] Pid: 4611, comm: montecarlo Tainted: P 2.6.36-gentoo-r5 #1 [ 910.658637] Call Trace: [ 910.658699] [] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx] [ 910.658743] [] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx] [ 910.658784] [] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx] [ 910.658827] [] ? _ZN4Asic19PM4ElapsedTimeStampEj14_LARGE_INTEGER12_QS_CP_RING_+0xaf/0x160 [fglrx] [ 910.658868] [] ? _ZN15QS_PRIVATE_CORE27multiVpuPM4ElapsedTimeStampEj14_LARGE_INTEGER12_QS_CP_RING_+0x33/0x50 [fglrx] [ 910.658910] [] ? _Z19uQSTimeStampRetiredmjj14_LARGE_INTEGER+0x74/0x80 [fglrx] [ 910.658951] [] ? _Z8uCWDDEQCmjjPvjS_+0x382/0xf00 [fglrx] [ 910.658981] [] ? firegl_cmmqs_CWDDE_32+0x334/0x440 [fglrx] [ 910.659006] [] ? firegl_cmmqs_CWDDE32+0x70/0x100 [fglrx] [ 910.659029] [] ? __do_fault+0x20d/0x430 [ 910.659057] [] ? firegl_cmmqs_CWDDE32+0x0/0x100 [fglrx] [ 910.659083] [] ? firegl_ioctl+0x1ea/0xeb0 [fglrx] [ 910.659089] [] ? handle_mm_fault+0x193/0x9d0 [ 910.659094] [] ? do_vfs_ioctl+0x9f/0x570 [ 910.659099] [] ? sys_ioctl+0x80/0xa0 [ 910.659104] [] ? system_call_fastpath+0x16/0x1b [ 910.659109] pubdev:0xffffffffa021e0e0, num of device:1 , name:fglrx, major 8, minor 80. [ 910.659113] device 0 : 0xffff88007ddac000 . [ 910.659116] Asic ID:0x68b8, revision:0x15, MMIOReg:0xffffc90000d80000. [ 910.659120] FB phys addr: 0xe0000000, MC :0xf00000000, Total FB size :0x40000000. [ 910.659123] gart table MC:0xf0fb07000, Physical:0xefb07000, size:0x1f8000. [ 910.659127] mc_node :FB, total 1 zones [ 910.659130] MC start:0xf00000000, Physical:0xe0000000, size:0xfd00000. [ 910.659133] Mapped heap -- Offset:0x0, size:0xfb07000, reference count:45, mapping count:0, [ 910.659137] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0, [ 910.659141] Mapped heap -- Offset:0xfb07000, size:0x1f9000, reference count:1, mapping count:0, [ 910.659144] mc_node :INV_FB, total 1 zones [ 910.659147] MC start:0xf0fd00000, Physical:0xefd00000, size:0x30300000. [ 910.659151] Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0, [ 910.659154] mc_node :GART_USWC, total 2 zones [ 910.659157] MC start:0x279e0000, Physical:0x0, size:0x27400000. [ 910.659161] Mapped heap -- Offset:0x20000, size:0x2000000, reference count:14, mapping count:0, [ 910.659164] mc_node :GART_CACHEABLE, total 3 zones [ 910.659167] MC start:0x10400000, Physical:0x0, size:0x175e0000. [ 910.659170] Mapped heap -- Offset:0x3500000, size:0x400000, reference count:1, mapping count:0, [ 910.659174] Mapped heap -- Offset:0x3100000, size:0x400000, reference count:1, mapping count:0, [ 910.659178] Mapped heap -- Offset:0x2d00000, size:0x400000, reference count:1, mapping count:0, [ 910.659182] Mapped heap -- Offset:0x2900000, size:0x400000, reference count:2, mapping count:0, [ 910.659185] Mapped heap -- Offset:0x2400000, size:0x500000, reference count:2, mapping count:0, [ 910.659189] Mapped heap -- Offset:0x1d00000, size:0x500000, reference count:2, mapping count:0, [ 910.659193] Mapped heap -- Offset:0x1800000, size:0x500000, reference count:2, mapping count:0, [ 910.659196] Mapped heap -- Offset:0x1300000, size:0x500000, reference count:3, mapping count:0, [ 910.659200] Mapped heap -- Offset:0xe00000, size:0x500000, reference count:4, mapping count:0, [ 910.659204] Mapped heap -- Offset:0x700000, size:0x700000, reference count:13, mapping count:0, [ 910.659208] Mapped heap -- Offset:0x200000, size:0x500000, reference count:11, mapping count:0, [ 910.659211] Mapped heap -- Offset:0x0, size:0x200000, reference count:7, mapping count:0, [ 910.659215] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0, [ 910.659220] GRBM : 0xe0707828, SRBM : 0x200000c0 . [ 910.659225] CP_RB_BASE : 0x27a000, CP_RB_RPTR : 0x138c0 , CP_RB_WPTR :0x138c0. [ 910.659229] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x27c7e000. [ 910.659233] last submit IB buffer -- MC :0x27c7e000,phys:0x7a364000. [ 910.659237] Dump the trace queue. [ 910.659240] End of dump

       

       

       

      3.)

      Using the flag for dumping the GPU code

      I get this output:

       

      ; ----------------- CS Data ------------------------

      ; Input Semantic Mappings

      ;    No input mappings

       

      GprPoolSize = 0

      CodeLen                 = 26848;Bytes

      PGM_END_CF              = 0; words(64 bit)

      PGM_END_ALU             = 0; words(64 bit)

      PGM_END_FETCH           = 0; words(64 bit)

      MaxScratchRegsNeeded    = 4

      ;AluPacking              = 0.0

      ;AluClauses              = 0

      ;PowerThrottleRate       = 0.0

      ; texResourceUsage[0]     = 0x00000000

      ; texResourceUsage[1]     = 0x00000000

      ; texResourceUsage[2]     = 0x00000000

      ; texResourceUsage[3]     = 0x00000000

      ; fetch4ResourceUsage[0]  = 0x00000000

      ; fetch4ResourceUsage[1]  = 0x00000000

      ; fetch4ResourceUsage[2]  = 0x00000000

      ; fetch4ResourceUsage[3]  = 0x00000000

      ; texSamplerUsage         = 0x00000000

      ; constBufUsage           = 0x00000000

      ResourcesAffectAlphaOutput[0]  = 0x00000000

      ResourcesAffectAlphaOutput[1]  = 0x00000000

      ResourcesAffectAlphaOutput[2]  = 0x00000000

      ResourcesAffectAlphaOutput[3]  = 0x00000000

       

      ;SQ_PGM_RESOURCES        = 0x3000063E

      SQ_PGM_RESOURCES:NUM_GPRS     = 62

      SQ_PGM_RESOURCES:STACK_SIZE           = 6

      SQ_PGM_RESOURCESRIME_CACHE_ENABLE   = 1

      ;SQ_PGM_RESOURCES_2      = 0x000000C0

      SQ_LDS_ALLOC:SIZE        = 0x00000000

      ; RatOpIsUsed = 0x2

      ; NumThreadPerGroupFlattened = 256

      ; SetBufferForNumGroup = true

       

      It says MaxScratchRegsNeeded=4 which is I think bad. From looking around the forums it seems that this parameter is supposed to be zero.

       

      I use a couple of loops for reordering matrices on the GPU. Is there something inherently wrong with the way I implemented this?

       

       

       

      for(uint k = 0; k < order; ++k) { uint kop = k*order + pos; uint kon = k*order + newsize; swaptemp = up[kon]; up[kon] = up[kop]; up[kop] = swaptemp; swaptemp = down[kon]; down[kon] = down[kop]; down[kop] = swaptemp; } //now the Rows /* for(uint k = 0; k < order; ++k) { swaptemp = up[newsize*order + k]; up[newsize*order + k] = up[pos*order + k]; up[pos*order + k] = swaptemp; swaptemp = down[newsize*order + k]; down[newsize*order + k] = down[pos*order + k]; down[pos*order + k] = swaptemp; }