cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

diapolo
Adept I

How to "memcpy" on GPU with strings?

I'm an OpenCL beginner, and I currently try to work with strings in Kernels. I need some kind of memcpy and read, that such a things is n/a in OpenCL, so I wrote a function by myself.

I know, that I have to use cl_khr_byte_addressable_store in order to use my function, but cl_khr_byte_addressable_store is missing for my HD5870, it´s only available on the CPU device.

So how could I implement that kind of function on a GPU? Or perhaps are there better ways to do that? Help is really appreciated!

 

Thanks,

Dia

void memcpy(uchar *dest, __constant uchar *source, uint Len) { for(uint i = 0; i < Len; ++i) { dest = source; } }

0 Likes
6 Replies
genaganna
Journeyman III

diapolo,

           Inefficient ways are many.

       1. Make dest as int*

0 Likes

Originally posted by: genaganna diapolo,

 

           Inefficient ways are many.

 

       1. Make dest as int*

 

So for Integer-Arrays I don´t need the mentioned extension? Like I said I´m currently learning, and I know that my first steps won´t be efficient, but I´m happy if they work in the first place .

 

Thanks,

Dia

0 Likes

 

So for Integer-Arrays I don´t need the mentioned extension?

 

No need to mension extension.

0 Likes

Originally posted by: genaganna diapolo,

 

           Inefficient ways are many.

 

       1. Make dest as int*

 

Your suggestion worked, but what about an efficient way? You don´t need to post code, but a few hints would be very nice.

Thanks,

Dia

0 Likes

Just for academic reasons, I'd love to know if this scheme works with openCL on a GPU:

 

unsigned int get_you_fired_copy(unsigned int* dest, const unsigned int* src, int len) { int unrolled_loops; int leftover; leftover = len & 0x07; unrolled_loops = ((unsigned int)len) >> 3; switch (leftover) { while (unrolled_loops--) { *dest++ = *src++; case 7: *dest++ = *src++; case 6: *dest++ = *src++; case 5: *dest++ = *src++; case 4: *dest++ = *src++; case 3: *dest++ = *src++; case 2: *dest++ = *src++; case 1: *dest++ = *src++; } } return 0; }

0 Likes

 

Your suggestion worked, but what about an efficient way? You don´t need to post code, but a few hints would be very nice.

 

Diapolo,

        Please read ATI_Stream_SDK_Performance_Notes.pdf for more hits on performance.  This doc available at http://developer.amd.com/gpu/ATIStreamSDK/pages/Documentation.aspx

0 Likes