I have a question.
my VGA is a 4870 single gpu card. so, it has 10 Compute Units and 16 Stream Cores per 1 Compute Unit.
By the way, in the below IL,
"dcl_num_thread_per_group 64\n"
"dcl_lds_size_per_thread 16\n"
if a thread needs 1 Stream Core, "64 threads / 1 group"  means  "4 Compute Units / 1 Group".
Then, in my case, 2 groups can run at the same time. and during the execution, 2 Compute Units sleep ?
(especially, a case which LDS are used in a group).

Should I rewrite  "dcl_num_thread_per_group 64\n" --> "dcl_num_thread_per_group 32\n" so that 5 groups(10 Comput Units) can run at the same time ?