Hi,
I am still getting familiarized with the HSA runtime programming environment so this may sound like a simple question. I am developing a small networking application (IPv4 router) on a Kaveri machine that uses a GPU module for IPv4 route lookups. My GPU module is written in OpenCL and I use cloc.sh to compile the kernel code to HSA code object (hsaco) format. I am using DPDK as my networking I/O driver for receiving and sending traffic.
I first tried to pass array of pointers to (rte_mbuf *) structures within the GPU kernel so that only the GPU directly retrieves the Ethernet frame (and the IPv4 header) so that the CPU does not waste any cycles in parsing the packet header fields (and avoid any necessary cache misses). Unfortunately, my program immediately crashes once the GPU tries to access packets' payload and I get the following messages in my dmesg log:
[33138.018390] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0
domain=0x0001 address=0x0000000000000f80 flags=0x0005]
[33138.019049] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address
0xFFFF91CD5E0D7000 flags 0x104
[33138.019050] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200
[33138.019052] kfd kfd: HSA Process (PID 11200) got unhandled exception
[33138.019718] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0
domain=0x0001 address=0x0000000000000f80 flags=0x0005]
[33138.020395] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address
0x35827A1F3000 flags 0x104
[33138.020397] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200
[33138.020398] kfd kfd: HSA Process (PID 11200) got unhandled exception
On more careful analysis I discovered that I am correctly passing the pointers but the kernel crashes once it tries to dereference the pointers.
I then tried to pass array of pointers to Ethernet frames to the GPU (CPU retrieves the packet pointer from the rte_mbuf structures) but this setup also triggered exactly the same crash as mentioned above.
I tried using hsa_memory_assign_agent() and hsa_memory_register()functions on the array of packet structures (both rte_mbuf * and uint8_t *) but I could not fix this problem. Any idea what I am doing wrong here?
H/W Specs:
model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
cpu MHz : 4000.000
cache size : 2048 KB
S/W Specs:
Linux kernel version: 4.4.0-kfd-compute-rocm-rel-1.1.1-10
Intel dpdk-16.04
CLOC 1.0.11 (April 2016 update)
HSA Runtime v1.6
amdkfd v1.6.1
Thanks!