Archives Discussions

ajamshed · ‎06-28-2016

Hi,

I am still getting familiarized with the HSA runtime programming environment so this may sound like a simple question. I am developing a small networking application (IPv4 router) on a Kaveri machine that uses a GPU module for IPv4 route lookups. My GPU module is written in OpenCL and I use cloc.sh to compile the kernel code to HSA code object (hsaco) format. I am using DPDK as my networking I/O driver for receiving and sending traffic.

I first tried to pass array of pointers to (rte_mbuf *) structures within the GPU kernel so that only the GPU directly retrieves the Ethernet frame (and the IPv4 header) so that the CPU does not waste any cycles in parsing the packet header fields (and avoid any necessary cache misses). Unfortunately, my program immediately crashes once the GPU tries to access packets' payload and I get the following messages in my dmesg log:

[33138.018390] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0

domain=0x0001 address=0x0000000000000f80 flags=0x0005]

[33138.019049] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address

0xFFFF91CD5E0D7000 flags 0x104

[33138.019050] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200

[33138.019052] kfd kfd: HSA Process (PID 11200) got unhandled exception

[33138.019718] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:01.0

domain=0x0001 address=0x0000000000000f80 flags=0x0005]

[33138.020395] kfd kfd: Invalid PPR device 0:1.0 pasid 1 address

0x35827A1F3000 flags 0x104

[33138.020397] kfd kfd: Sending SIGSEGV to HSA Process with PID 11200

[33138.020398] kfd kfd: HSA Process (PID 11200) got unhandled exception

On more careful analysis I discovered that I am correctly passing the pointers but the kernel crashes once it tries to dereference the pointers.

I then tried to pass array of pointers to Ethernet frames to the GPU (CPU retrieves the packet pointer from the rte_mbuf structures) but this setup also triggered exactly the same crash as mentioned above.

I tried using hsa_memory_assign_agent() and hsa_memory_register()functions on the array of packet structures (both rte_mbuf * and uint8_t *) but I could not fix this problem. Any idea what I am doing wrong here?

H/W Specs:

model name : AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G

cpu MHz : 4000.000

cache size : 2048 KB

S/W Specs:

Linux kernel version: 4.4.0-kfd-compute-rocm-rel-1.1.1-10

Intel dpdk-16.04

CLOC 1.0.11 (April 2016 update)

HSA Runtime v1.6

amdkfd v1.6.1

Thanks!

bridgman · ‎06-29-2016

Sounds like the buffers you are trying to access were not mapped to userspace in the first place, ie they are only accessible by kernel code at the moment.

Are you able to access the buffers via the pointers you provide by CPU ? If so then there may be something odd about the way the buffers were mapped to userspace but in general anything that is accessible by CPU from userspace should also be accessible by GPU.

ajamshed · ‎07-07-2016

Thanks for the suggestion. Actually I had a bug in my GPU kernel code and it was trying to access an out-of-bounds memory region. After fixing that bug, my program no longer crashes. However, whenever my kernel code tries to dereference any packet pointer, it only gets fields with zero values (whether it is an Ethernet MAC address (00:00:00:00:00:00), or an IP src addr (0x00) etc.). I am sure that my CPU part of the code is not bzero-ing the packet pointers.... when the kernel execution finishes and I tried to retrieve packet contents from the CPU side I see the right values.

Archives Discussions

HSA runtime programming question