cancel
Showing results for 
Search instead for 
Did you mean: 

Newcomers Start Here

phridrich
Adept I

[ABUSE] By: phridrich / Board: newcomer-forum (806)

Hi, I know I've posted same question in another section, but I just noticed that initial section was wrong, and haven't found any way to move the same question to another section. I edited old question, now it redirects to new one I created. Can you please suggest me how to remove questions or move them to another sections?


Link to post: (DirectX 12 per instance data fetch)
by phridrich


https://community.amd.com/t5/newcomers-start-here/directx-12-per-instance-data-fetch/m-p/437120#M806


  Hi, I'm creating DirectX 12 application and using Radeon GPU Profiler for profiling on 5700 XT card. I'm using indirect drawing for rasterizing a scene, and using per instance vertex buffers to provide mesh-related data to shaders. Here is one of vertex shaders which use this principle: void main( in float3 in_position : POSITION, // per-vertex in float4x4 in_transform : TRANSFORM, // per-instance out float4 out_position : SV_Position ) { float4 hdc_position = mul( float4( in_position, 1.0f ), in_transform ); out_position = float4( hdc_position.xyz, 1.0f ); } According to RGP, this results in folowing Radeon ISA code: s_inst_prefetch 0x3 // 000000000000: BFA00003 s_getpc_b64 s[0:1] // 000000000004: BE801F80 s_mov_b32 s0, s5 // 000000000008: BE800305 s_load_dwordx8 s[4:11], s[0:1], 0x0 // 00000000000C: F40C0100 FA000000 v_add_nc_u32_e32 v0, s2, v0 // 000000000014: 4A000002 v_add_nc_u32_e32 v1, s3, v3 // 000000000018: 4A020603 s_waitcnt lgkmcnt(0) // 00000000001C: BF8CC07F tbuffer_load_format_xyz v[2:4], v0, s[4:7], format:74, 0 idxen // 000000000020: EA522000 80010200 s_clause 0x3 // 000000000028: BFA10003 tbuffer_load_format_xyz v[5:7], v1, s[8:11], format:74, 0 idxen offset:16 // 00000000002C: EA522010 80020501 tbuffer_load_format_xyz v[8:10], v1, s[8:11], format:74, 0 idxen // 000000000034: EA522000 80020801 tbuffer_load_format_xyz v[11:13], v1, s[8:11], format:74, 0 idxen offset:32 // 00000000003C: EA522020 80020B01 tbuffer_load_format_xyz v[14:16], v1, s[8:11], format:74, 0 idxen offset:48 // 000000000044: EA522030 80020E01 s_waitcnt vmcnt(3) // 00000000004C: BF8C3F73 v_mul_f32_e32 v1, v3, v5 // 000000000050: 10020B03 v_mul_f32_e32 v5, v3, v6 // 000000000054: 100A0D03 v_mul_f32_e32 v3, v3, v7 // 000000000058: 10060F03 s_waitcnt vmcnt(2) // 00000000005C: BF8C3F72 v_mac_f32_e32 v1, v2, v8 // 000000000060: 3E021102 v_mac_f32_e32 v5, v2, v9 // 000000000064: 3E0A1302 v_mac_f32_e32 v3, v2, v10 // 000000000068: 3E061502 s_waitcnt vmcnt(1) // 00000000006C: BF8C3F71 v_mac_f32_e32 v1, v4, v11 // 000000000070: 3E021704 v_mac_f32_e32 v5, v4, v12 // 000000000074: 3E0A1904 v_mac_f32_e32 v3, v4, v13 // 000000000078: 3E061B04 s_waitcnt vmcnt(0) // 00000000007C: BF8C3F70 v_add_f32_e32 v0, v14, v1 // 000000000080: 0600030E v_add_f32_e32 v1, v15, v5 // 000000000084: 06020B0F v_add_f32_e32 v2, v16, v3 // 000000000088: 06040710 v_mov_b32_e32 v3, 1.0 // 00000000008C: 7E0602F2 exp pos0 v0, v1, v2, v3 done // 000000000090: F80008CF 03020100 s_endpgm // 000000000098: BF810000 The thing which gets my attention here is that vector memory load instructions are used for loading per-instance data. According to my understanding, vertex shader groups always process vertices from a single instance, so it's possible to use scalar memory loads here. So here are my questions: 1. Is my assumption about single instance for vertex shader group is valid? If not, it's indeed not valid to use scalar memory lodas here, and everything is fine. 2. If my assumption is valid, is these vector memory loads are actually a big problem? I assume scalar loads would be better, but the difference may be hardly visible due to memory caching. 3. Maybe there are some other limitations which prevent compiler from using scalr loads here? Maybe, I don't provide some critical information on the CPU side? Or it's just a matter of driver implementation?


This message has 0 replies


0 Likes
2 Replies
dipak
Big Boss

I have whitelisted you and moved the post to the appropriate forum.

Thanks.

Thank you very much!

0 Likes