Using the version 0.8 release, compiled with both gcc 4.7.4 and 4.8.3:
# time ./vector_copy
Initializing the hsa runtime succeeded.
Calling hsa_iterate_agents succeeded.
Checking if the GPU device is non-zero succeeded.
Querying the device name succeeded.
The device name is Spectre.
Querying the device maximum queue size succeeded.
The maximum queue size is 131072.
Creating the queue succeeded.
Creating the brig module from vector_copy.brig succeeded.
Creating the hsa program succeeded.
Adding the brig module to the program succeeded.
Finding the symbol offset for the kernel succeeded.
Finalizing the program succeeded.
Querying the kernel descriptor address succeeded.
Creating a HSA signal succeeded.
Registering argument memory for input parameter succeeded.
Registering argument memory for output parameter succeeded.
Finding a kernarg memory region succeeded.
Allocating kernel argument memory buffer succeeded.
Registering the argument buffer succeeded.
Dispatching the kernel succeeded.
I manually aborted it after nearly 2 hours of wall time, with all of that time spent at 100% usage of one CPU core.