0 Replies Latest reply on Apr 1, 2016 10:11 AM by vgarcia

    Problems with AMD's gem5 APU model




      First of all I'd like to thank you for releasing the APU (gpu-compute) model for the gem5 simulator.

      I've been trying to run some benchmarks with it following the MICRO slides you made available here GPU Models - gem5 , but I am having problems getting the kernels to read memory locations allocated by the host.


      I have used the compiler toolchain and the simplified OpenCL 2.0 runtime API you provide to create the binary.

      I've compiled the runtime with debug symbols and everything appears to be going OK, but it seems the kernel always gets the arguments to memory allocated by the host as null pointers:


      1042136000: system.cpu1.CUs-port0: Wave 5 couldn't tranlate vaddr 0
      gem5.opt: build/HSAIL_X86/gpu-compute/compute_unit.cc:1152: virtual bool ComputeUnit::DTLBPort::recvTimingResp(PacketPtr): Assertion `translation_state->tlbEntry->valid' failed.
      Program aborted at tick 1042136000


      The reduced API you provide has no implementation of clSVMAlloc or clSetKernelArgSVMPointer, so I am allocating the memory in the host with a normal malloc call, and passing the argument with clSetKernelArg.

      Checking the debug output of the OpenCL runtime, it seems like its getting everything OK:


      hsaDriverInit()hsaDriverInit(): found 1 kernels
              1 31 144
              0x7a5f80 0x7a5fb0 0x2aaaaaacb000
      HSAIL-GPU       clCreateCommandQueue()
      clCreateKernel() __OpenCL_test_kernel
      clSetKernelArg(0x7a8630, 0, 512, 0x7a59f0)
      Launching kernel
      launching test
      HSA runtime: Offset 0
      HSA runtime: Offset 8
      HSA runtime: Offset 16
      HSA runtime: Offset 24
      HSA runtime: Offset 32
      HSA runtime: Offset 40
      HSA runtime: Offset 512
      regs: s 8 d 1 c 1

      or at least the clSetKernelArg call is getting the size and pointer right.

      Using the debug flags from gem5 I see that the kernel starts executing, and the error happens on the first time the kernel parameter (defined as global) is read.


      Any idea about what I may be doing wrong?