I would suggest you take a look more into how OpenCL function ( the specification is available at the Khronos website). The gist of OpenCL is to have parallelization seamless to developer or at least, specified at a higher level. One does not manually assign work item to specific threads, instead your workspace is divided into workgroups which are then divided into work items. The driver manages kernel execution and work item scheduling. What is the need to map specific work item to specific thread ? OpenCL can run on a lot of different platform, whose underlying memory subsystem and processing cores cannot be explicitly accessed.
Here's something I managed to figure out. I used a feature called inline PTX to return the particular details of the thread . In this case i used it to get the warp ID, warp lane ID and the streaming multiprocessor ID. However the warp id and warp lane id is coming out as expected(ie warp lane from 0-31 and each warp getting executing) the SM ID is 0 (zero) for all the threads. When i checked the clinfo , the Max compute units: 1. So does this mean that the SM ID is zero for all the threads is zero because of that?
Also how can my nvidia quadro 410 have Max compute units as 1 when there are 192 cuda cores?
edit: apparently inline ptx doesnt work on AMD GPU's . So gathering info of the core is still unsolved on the AMD GPU's. fyi: i have another system with AMD R9 290x GPU in which I run code parallel with my NVIDIA card.