    Memory Benchmarks, Multiple component Instances

      Hi Experts,

      I am a new bee to GPU programming. I was wondering if there are any benchmarks for the CPU<->GPU memory transfers and internal memory transfers inside the device?

      Also I would appreciate someone can point on the impact when multiple instances of a data intensive component e.g. video encoders are run in a single GPU card (any slow down expected, any cap on the number of such invocations).

      Thank you in anticipation, greatly appreciate your inputs