0 Replies Latest reply on Sep 8, 2011 3:56 PM by SiegeLord

    CPU thread scheduler

    SiegeLord

      So I've been trying to figure out what is the best number of workgroups to use on a CPU device. I've heard claims that it's best to the number of workgroups that are a multiple of the core count, but when I tried that the performance wasn't so good (I tried one workgroup per core)... and more importantly only 2 out of 4 cores were used. So I tried to systematically vary the number of workgroups on the two CPUs I have and counted the number of threads the process ends up using. The tables are in the format "workgroup count: thread/core used count":

       

      Phenom II X4 810
       1: 1
       2: 2
       3: 3
       4: 2
       5: 3
       6: 3
       7: 4
       8: 3
       9: 3
      10: 4
      11: 4
      12: 3
      13: 4
      14: 4
      15: 4
      16: 4
      17: 4
      18: 4
      19: 4
      20: 4

       

      Phenom II X6 1090T
      1: 1
      2: 2
      3: 3
      4: 4
      5: 5
      6: 3
      7: 4
      8: 4
      9: 5
      10: 5
      11: 6
      12: 4
      13: 5
      14: 5
      15: 5
      16: 6
      17: 6
      18: 5
      19: 5
      20: 5
      21: 6
      22: 6
      23: 6
      24: 5
      25: 5
      26: 6
      27: 6
      28: 6
      29: 6
      30: 5

       

      Some very bizzare numbers. Couple of additional observations:

      - Given the same number of work items, using more cores yields faster run times

      - Given the same number of cores, using less workgroups yields faster run times

      - When using long-running kernels, the number of threads used decreases by 1 after a few seconds of runtime

      So my question is... why do my tables have such weird numbers? Why is one core dropped after a few seconds of runtime?