Well the question already tells most of what I would like to know. Since vector and scalar ALU are different pieces of hardware I wondered if the two alus can be active in parallel when the instructions running have no data dependencies.
Would like to know if this would work at all or maybe only conditional (e.g. when they are both max 32 bit encoded operations) or well - if the two can not run in parallel at all Sadly was not able to find any information about in CU parallelism - different to old TeraScale architecture where some instruction sorting may have helped performance.
The "GCN Architecture whitepaper " says that CU front-end can issue a mix of scalar and vector instructions in the same cycle, if they follow the required restrictions. So, those operations can run in parallel. Below is the related description from the "GCN Architecture whitepaper ":
The CU front-end can decode and issue seven different types of instructions: branches, scalar ALU or memory, vector ALU, vector memory, local data share, global data share or export, and special instructions. Only issue one instruction of each type can be issued at a time per SIMD, to avoid oversubscribing the execution pipelines. To preserve in-order execution, each instruction must also come from a different wavefront; with 10 wavefronts for each SIMD, there are typically many available to choose from. Beyond these two restrictions, any mix is allowed, giving the compiler plenty of freedom to issue instructions for execution.
The CU front-end can issue five instructions every cycle, to a mix of six vector and scalar execution pipelines using two register files. The vector units provide the computational power that is critical for graphics shaders as well as general purpose applications. Together with the special instructions that are handled in the instruction buffers, the two scalar units are responsible for all control flow in the GCN Architecture.