Archives Discussions

SiegeLord · ‎09-08-2011

So I've been trying to figure out what is the best number of workgroups to use on a CPU device. I've heard claims that it's best to the number of workgroups that are a multiple of the core count, but when I tried that the performance wasn't so good (I tried one workgroup per core)... and more importantly only 2 out of 4 cores were used. So I tried to systematically vary the number of workgroups on the two CPUs I have and counted the number of threads the process ends up using. The tables are in the format "workgroup count: thread/core used count":

Phenom II X4 810
1: 1
2: 2
3: 3
4: 2
5: 3
6: 3
7: 4
8: 3
9: 3
10: 4
11: 4
12: 3
13: 4
14: 4
15: 4
16: 4
17: 4
18: 4
19: 4
20: 4

Phenom II X6 1090T
1: 1
2: 2
3: 3
4: 4
5: 5
6: 3
7: 4
8: 4
9: 5
10: 5
11: 6
12: 4
13: 5
14: 5
15: 5
16: 6
17: 6
18: 5
19: 5
20: 5
21: 6
22: 6
23: 6
24: 5
25: 5
26: 6
27: 6
28: 6
29: 6
30: 5

Some very bizzare numbers. Couple of additional observations:

- Given the same number of work items, using more cores yields faster run times

- Given the same number of cores, using less workgroups yields faster run times

- When using long-running kernels, the number of threads used decreases by 1 after a few seconds of runtime

So my question is... why do my tables have such weird numbers? Why is one core dropped after a few seconds of runtime?

Archives Discussions

CPU thread scheduler