cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

riza_guntur
Journeyman III

How to control number of core/cpu used

I need to run the kernel only in 2 or 3 of my CPU core, how?

0 Likes
13 Replies
genaganna
Journeyman III

Use local_work_size=2 or 3

0 Likes

i dont think it will be effective. http://bit.ly/Y0WZu as you see at graph small local workgroup size lead to big overhead. and set local group size to 2-3 dont avoid to run two local group parralel.

maybe set CPU affinity in he system?

0 Likes

Originally posted by: nou i dont think it will be effective. http://bit.ly/Y0WZu as you see at graph small local workgroup size lead to big overhead. and set local group size to 2-3 dont avoid to run two local group parralel.

maybe set CPU affinity in he system?

How? I'm doing this manually to 2 cores and it runs almost as fast as 4 cores

0 Likes

SetProcessAffinityMask() has two parameter. handle of process and bitmask.
0x0000000000000001 - first core
0x0000000000000002 - second core
0x0000000000000003 - first and second


http://msdn.microsoft.com/en-us/library/ms686223(VS.85).aspx

http://msdn.microsoft.com/en-us/library/ms686247(VS.85).aspx

0 Likes

It is possible to control the number of cores that are used to execute a kernel by setting the environment variable:

CPU_MAX_COMPUTE_UNITS=n

where 'n' is the number of cores to use and can range from 1...num cores in system.

Currently there is no way to set the affinity for OpenCL and tie kernel exection to particular cores, but you should look out for this in the future.

0 Likes
jasno
Journeyman III

Originally posted by: brg It is possible to control the number of cores that are used to execute a kernel by setting the environment variable:

 

CPU_MAX_COMPUTE_UNITS=n

 

where 'n' is the number of cores to use and can range from 1...num cores in system.

 

Currently there is no way to set the affinity for OpenCL and tie kernel exection to particular cores, but you should look out for this in the future.

 

 

So, much time has gone by and we are now at stream sdk 2.3, does this env var still work (did it ever work) ? I am trying to run openCL codes on a 24 core machine but don't have exclusive access to it so need to limit my job to a subset of the number of cores available. I have tried setting this env var but it seems to have no effect, are there other alternativres to constrain the resources used ?

--

jason

0 Likes

it work for me. or look at cl_amd_device_fission extension.

0 Likes

Originally posted by: genaganna Use local_work_size=2 or 3

Where? Env variables?

0 Likes

clEnqueueNDRangeKernel() sixth parameter. but it didnt work. I set local_work_size to 1 and it still dipatch 4 threads. and had enourmous overhead in compare with local_work_size >100.

try search thread/processor affinity.

0 Likes

as nou mentioned, please use the cl_amd_device_fussion extension to do what you need to do.
0 Likes

Originally posted by: MicahVillmow as nou mentioned, please use the cl_amd_device_fussion extension to do what you need to do.


 

Hi thanks for replying, but I am struggling to apply either suggestion. I can find no reference to either cl_amd_device_fussion or cl_amd_device_fission nor can I compile any code containing this extension. I have tried variants such as

#pragma OPENCL EXTENSION  cl_amd_device_fussion : enable

or

#pragma OPENCL EXTENSION  cl_amd_device_fission : enable

 

or

#pragma OPENCL EXTENSION  cl_amd_device_fussion : 4

or

#pragma OPENCL EXTENSION  cl_amd_device_fission : 4

None of which will compile. I have tried it as an environment variable

export cl_amd_device_fussion=4

export cl_amd_device_fission=4

 

with seemingly no effect.

 

Can you point me to documentation for this or an example of how it is used . . .

 

TIA

 

--

jason

 

p.s. This is a list of extensions that seem to be in libatiocl64.so

cl_amd_fp64cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_amd_popcnt cl_amd_printf

 

 

p.p.s

Ahhhh, OK, probably you mean cl_ext_device_fission . . .?

 

--

jason

0 Likes

http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx here you can find webseminar about device fission

sorry about misleading name.

0 Likes
jasno
Journeyman III

OK, so now I have some code working using the cl_ext_device_fission extension, but it only works for CL_DEVICE_PARTITION_BY_COUNTS_EXT and CL_DEVICE_PARTITION_EQUALLY_EXT.

When I try and use CL_DEVICE_PARTITION_BY_NAMES_EXT and CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT I just get CL_INVALID_VALUE from clCreateSubDevicesEXT().

Should all of these options work with a machine that has a pair of AMD Opteron(tm) Processor 6174 (x86_64) CPUs in it (running SUSE 11.3)

Thanks for any advice

jason

0 Likes