I need to run the kernel only in 2 or 3 of my CPU core, how?
Use local_work_size=2 or 3
i dont think it will be effective. http://bit.ly/Y0WZu as you see at graph small local workgroup size lead to big overhead. and set local group size to 2-3 dont avoid to run two local group parralel.
maybe set CPU affinity in he system?
Originally posted by: nou i dont think it will be effective. http://bit.ly/Y0WZu as you see at graph small local workgroup size lead to big overhead. and set local group size to 2-3 dont avoid to run two local group parralel.
maybe set CPU affinity in he system?
How? I'm doing this manually to 2 cores and it runs almost as fast as 4 cores
SetProcessAffinityMask() has two parameter. handle of process and bitmask.
0x0000000000000001 - first core
0x0000000000000002 - second core
0x0000000000000003 - first and second
http://msdn.microsoft.com/en-us/library/ms686223(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms686247(VS.85).aspx
It is possible to control the number of cores that are used to execute a kernel by setting the environment variable:
CPU_MAX_COMPUTE_UNITS=n
where 'n' is the number of cores to use and can range from 1...num cores in system.
Currently there is no way to set the affinity for OpenCL and tie kernel exection to particular cores, but you should look out for this in the future.
Originally posted by: brg It is possible to control the number of cores that are used to execute a kernel by setting the environment variable:
CPU_MAX_COMPUTE_UNITS=n
where 'n' is the number of cores to use and can range from 1...num cores in system.
Currently there is no way to set the affinity for OpenCL and tie kernel exection to particular cores, but you should look out for this in the future.
So, much time has gone by and we are now at stream sdk 2.3, does this env var still work (did it ever work) ? I am trying to run openCL codes on a 24 core machine but don't have exclusive access to it so need to limit my job to a subset of the number of cores available. I have tried setting this env var but it seems to have no effect, are there other alternativres to constrain the resources used ?
--
jason
it work for me. or look at cl_amd_device_fission extension.
Originally posted by: genaganna Use local_work_size=2 or 3
Where? Env variables?
clEnqueueNDRangeKernel() sixth parameter. but it didnt work. I set local_work_size to 1 and it still dipatch 4 threads. and had enourmous overhead in compare with local_work_size >100.
try search thread/processor affinity.
Originally posted by: MicahVillmow as nou mentioned, please use the cl_amd_device_fussion extension to do what you need to do.
Hi thanks for replying, but I am struggling to apply either suggestion. I can find no reference to either cl_amd_device_fussion or cl_amd_device_fission nor can I compile any code containing this extension. I have tried variants such as
#pragma OPENCL EXTENSION cl_amd_device_fussion : enable
or
#pragma OPENCL EXTENSION cl_amd_device_fission : enable
or
#pragma OPENCL EXTENSION cl_amd_device_fussion : 4
or
#pragma OPENCL EXTENSION cl_amd_device_fission : 4
None of which will compile. I have tried it as an environment variable
export cl_amd_device_fussion=4
export cl_amd_device_fission=4
with seemingly no effect.
Can you point me to documentation for this or an example of how it is used . . .
TIA
--
jason
p.s. This is a list of extensions that seem to be in libatiocl64.so
cl_amd_fp64cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_media_ops cl_amd_popcnt cl_amd_printf
p.p.s
Ahhhh, OK, probably you mean cl_ext_device_fission . . .?
--
jason
http://developer.amd.com/zones/OpenCLZone/Events/pages/OpenCLWebinars.aspx here you can find webseminar about device fission
sorry about misleading name.
OK, so now I have some code working using the cl_ext_device_fission extension, but it only works for CL_DEVICE_PARTITION_BY_COUNTS_EXT and CL_DEVICE_PARTITION_EQUALLY_EXT.
When I try and use CL_DEVICE_PARTITION_BY_NAMES_EXT and CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT I just get CL_INVALID_VALUE from clCreateSubDevicesEXT().
Should all of these options work with a machine that has a pair of AMD Opteron(tm) Processor 6174 (x86_64) CPUs in it (running SUSE 11.3)
Thanks for any advice
jason