I always see CL_DEVICE_MAX_WORK_GROUP_SIZE set to 256 on the Southern Islands (e.g. Tahiti, Pitcairn), but according to the AMD Southern Islands ISA doc section 4.3, "Up to 16 wavefronts (1024 work-items) can be combined into a work-group.
Is it possible to enable large work group sizes up to 1024 in the driver?
Larger work group size is good for image processing with overlapped title border, for example I have one filter with a 4 pixel border. With 1024 work-items, I can do 32*32 tiles with a 24*24 payload (56% efficiency), whereas with 256 work-items it would be 16*16 tiles with 8*8 payload (25% efficiency).