cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

dipak
Staff
Staff

Re: OpenCL compiler bug

Yes, I agree with you. I also thought that it's just an optimization that is pointed out in the optimization guide as "this can help keep the GPU busy with kernel execution and DMA transfers".

Anyway, let me check with OpenCL team. I believe they can provide more insights regarding this.

Thanks again for providing these valuable inputs.

Thanks.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

I ran the latest attached code on my setup and got similar findings as you mentioned above. It indeed seems that synchronization using events has no effect on the ordering.

Also, as I checked with the OpenCL team. The code looks good to them and they have asked me to create a ticket to investigate the issue in detail. I'll create a ticket and include these testing results. I'll let you know if I've any update on this.

Thanks.

0 Kudos
Reply
neworderofjamie
Adept I
Adept I

Re: OpenCL compiler bug

Good to hear you can reproduce. Does that mean you also require more than the single flush before reading to see correct results? Do you have any idea of a timescale on that ticket?

Thanks for all your continuing help with this!

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

 Does that mean you also require more than the single flush before reading to see correct results?

In my case, a single flush before the reading is enough to produce the correct result.

As I tried the macros, I observed below outputs and event orders:

  1. Default: (0, 0, 0, 0) -> readXPostEvent occurs before updatePresynapticEvent and updateNeuronsEvent 
  2. with FLUSH_BEFORE_READ: (1, 1, 0, 0) -> readXPostEvent occurs after updatePresynapticEvent and updateNeuronsEvent 
  3. with FLUSH_BETWEEN_KERNELS: (0, 0, 0, 0) -> readXPostEvent occurs after updatePresynapticEvent, but before updateNeuronsEvent 
  4. with WAIT_BEFORE_READ: (0, 0, 0, 0) -> readXPostEvent occurs before updatePresynapticEvent and updateNeuronsEvent 

I believe, a clFinish before the reading should work without any other clFlush. In that case, passing "CL_TRUE" to enqueueReadBuffer would be effectively a no-wait operation. I know, these approaches may not be as effective as event/barrier based synchronization, but they can be used as workaround till a fix is available.

Do you have any idea of a timescale on that ticket?

Sorry, it's difficult to provide any timeline at this moment. 

0 Kudos
Reply
neworderofjamie
Adept I
Adept I

Re: OpenCL compiler bug

Sadly, a finish before the read results in correct ordering but incorrect output (like the single flush):

1007736281121096(mapXPostEventStart)
1007736281153370(mapSpkCntPreEventStart)
1007736281307106(fillXPostEventStart)
1007736281312386(fillXPostEventEnd)
1007736281312746(fillInSynEventStart)
1007736281313066(fillInSynEventEnd)
1007736281317066(buildNeuronKernelEventStart)
1007736281318546(buildNeuronKernelEventEnd)
1007736281322146(buildPresynapticKernelEventStart)
1007736281323546(buildPresynapticKernelEventEnd)
1007736281497466(writeSpkCntPreEventStart)
1007736281502786(writeSpkCntPreEventEnd)
1007736281599546(updatePresynapticEventStart)
1007736281604866(updatePresynapticEventEnd)
1007736281605226(updateNeuronsEventStart)
1007736281605586(updateNeuronsEventEnd)
1007736281726036(readXPostEventStart)
1007736281729036(readXPostEventEnd)

So the only workaround we currently have is to flush between every kernel launch which is very detrimental to performance. However, I totally understand with respect to the timeline, if you could keep me updated via this thread that would be great.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

Sure, I'll let you know if I get any update about this issue.

a finish before the read results in correct ordering but incorrect output (like the single flush)

This is another unexpected behavior. Did you observe it on Windows or Linux? Please let me know your setup details. I'll mention this information in the related ticket.

I think it would really helpful if you can provide any profiler report for these cases i.e. with single clFlush or clFinish.

Thanks.

0 Kudos
Reply
neworderofjamie
Adept I
Adept I

Re: OpenCL compiler bug

We can reproduce this on both a Linux system with a Radeon 5700 XT and GPU PRO 20.30 drivers; and a Windows system with a Radeon RX 580 and 20.5.1 drivers. If I can get the profiler to work, I'll post the results here.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

Thanks for the information.

0 Kudos
Reply
dipak
Staff
Staff

Re: OpenCL compiler bug

Just FYI.

It looks like more recent drivers are available for both Windows (Adrenalin 20.9.1 WHQL and 20.9.2 Optional) and Linux (AMDGPU-Pro 20.40). As it is always recommended to verify an issue with the latest drivers, I would suggest you to try those recent drivers to see if there is any different observations.

Please note, I tested with Adrenalin 20.9.1.

Thanks.

0 Kudos
Reply
neworderofjamie
Adept I
Adept I

Re: OpenCL compiler bug

It's going to take us a little longer to get our Linux machine upgraded but, on Windows with a RX 580 and 20.9.2 drivers, the behaviour we see is unchanged i.e. a single flush or finish before the read does not result in correct results. What GPU are you testing on?

Thanks

0 Kudos
Reply