    Performance timing in Brook+


      I want to compute the wall time taken for a compute kernel, neglecting time taken for data transfer to the GPU. Is there an easy way to do this? I could use heuristics to approximate data transfer time and subtract that from the total time including the data transfer, but I don't expect this to be very accurate.