it says that clEnqueueTask is equal to clEnqueueNDRange with global, local size and dimension all set to 1.
but current GPU can't execute two kernels at once. and to fully utilize GPU you must have many hundreds of workitems not just one.
dimension you use depend on problem you are solving. for example you do reduction of array so you use 1D. or you program matrix operation then you use 2D.
I think nou explained it correctly but let me do it in my terms which might make it clearer for you.
First the NDRange. I think there are no practical restrictions to use any particular problem. Differnet options for choosing the ndrange are given to make the things more easy to understand. For eg. there is nothing wrong in implementing a 2D matrix addition as 1Dndrange, it only makes things more logically clear.
Regarding the task parallelism it is definitely not possible to run 2 different kernels concurrently so task parallelism on GPUs is not something to be encouraged for now. Although GPUs are extremely helpful in data parallel operations.