First of all I'd like to thank you for releasing the APU (gpu-compute) model for the gem5 simulator.
I've been trying to run some benchmarks with it following the MICRO slides you made available here GPU Models - gem5 , but I am having problems getting the kernels to read memory locations allocated by the host.
I have used the compiler toolchain and the simplified OpenCL 2.0 runtime API you provide to create the binary.
I've compiled the runtime with debug symbols and everything appears to be going OK, but it seems the kernel always gets the arguments to memory allocated by the host as null pointers:
1042136000: system.cpu1.CUs-port0: Wave 5 couldn't tranlate vaddr 0
gem5.opt: build/HSAIL_X86/gpu-compute/compute_unit.cc:1152: virtual bool ComputeUnit::DTLBPort::recvTimingResp(PacketPtr): Assertion `translation_state->tlbEntry->valid' failed.
Program aborted at tick 1042136000
The reduced API you provide has no implementation of clSVMAlloc or clSetKernelArgSVMPointer, so I am allocating the memory in the host with a normal malloc call, and passing the argument with clSetKernelArg.
Checking the debug output of the OpenCL runtime, it seems like its getting everything OK:
hsaDriverInit()hsaDriverInit(): found 1 kernels
1 31 144
0x7a5f80 0x7a5fb0 0x2aaaaaacb000
clSetKernelArg(0x7a8630, 0, 512, 0x7a59f0)
HSA runtime: Offset 0
HSA runtime: Offset 8
HSA runtime: Offset 16
HSA runtime: Offset 24
HSA runtime: Offset 32
HSA runtime: Offset 40
HSA runtime: Offset 512
regs: s 8 d 1 c 1
or at least the clSetKernelArg call is getting the size and pointer right.
Using the debug flags from gem5 I see that the kernel starts executing, and the error happens on the first time the kernel parameter (defined as global) is read.
Any idea about what I may be doing wrong?