The "GCN Architecture whitepaper " says that CU front-end can issue a mix of scalar and vector instructions in the same cycle, if they follow the required restrictions. So, those operations can run in parallel. Below is the related description from the "GCN Architecture whitepaper ":
The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share, global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions for execution.
The CU front-end can issue five instructions every cycle, to a mix of six vector and scalar execution pipelines using two register files. The vector units provide the computational power that is critical for graphics shaders as well as general purpose applications. Together with the special instructions that are handled in the instruction buffers, the two scalar units are responsible for all control flow in the GCN Architecture.
Thanks.