cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Sadikuz
Adept I

char array manipulation

Hi, all.

 

I'm a complete newbie in OpenCL and I need some help in a piece of code I'm working for practice. What I need to do is load an entire text file (~19M) in device's global memory (GPU), copy blocks of 32 or 64 (this value can change) lines to local memory and then search for a specifics therms (a customer list, with ~14000 names) in that block. So I'll have a lot threads, each one looking for a specific name. My main problem is that I don't know how to copy my text file (which is actually stored in a 2D char array in C) to device's memory. Every time I have acess restricted to the 1st element of array. Can you give some advice on how can I do this?

0 Likes
4 Replies
tonyo_au
Journeyman III

One way would create a 2D integer texture, copy the data to the image and then access it as a texture.

Another is to create a buffer (which is a 1D array), copy the names to it with a stride that is as wide as the longest name. Now each name is 1 stride apart in the buffer.

A third options is to have 2 buffers - one is a 1D array with the names tightly packed and an index array. This may be more efficient on memory but I don't whether it is as good preformance wise.

From what I have read data in multiples of 16 is best. Someone may be able to common more on the efficency of these methods

0 Likes

The first method will definitely allow you to allocate enough memory, but you might have to get clever with indexing if you have more than 8192 strings (the max height of an image).

The second option is probably the best since you'll need arbitrary byte addressing, which I don't think you can get with a texture. Unfortunately, if I recall, you can only allocate 128MB in a single go with AMD GPUs. This may cause problems if the 19MB of data has a large disparity between the max string length and the average string length; if the max string is 7-8x longer than the average string, you'll roughly 7-8x increase your memory needs blowing your 128MB budget.

The third option will be more of a hassel than a performance hit, since GPUs can perform scan operations fairly efficiently and it sounds like you'll get tons of data reuse when you search the customer list against your text file. If option 2 doesn't work, option 3 definitely will.

In any case, the compiler is going to generate a bunch of shift and mask operations on AMD GPUs since local memory uses word addressing, so you may want to cast your strings to ints after loading them from global memory in bulk and store them in a local memory integer array to avoid this.

0 Likes

Thank you guys for your attention. After your explanations, everything is pretty clear. I rewrote my code trying to follow your suggestions, but I still can't access more than 1 line of my text file when I store the data on device's memory. I'll attach the relevant part of my algorithm at the end of this message, so you'll be able to read it and tell me what I'm doing wrong.

[]s

Walid



 //At this point, I have the entire text file loaded
 //in a C++ string vector (slInFile), and I know the
 //max string length (iTMS: no more than 100 chars).
 unsigned  int  iTotalBytesInFile;
           char inFile[slInFile.size()][iTMS];
 std::string sAux("");
 //Copy the entire file to char buffer
 for (unsigned int i=0; i<slInFile.size(); i++)
 {
  sAux.assign(slInFile.at(i));
  trim(sAux);
  if (sAux.compare("") != 0)
  {
   sAux.assign(toLowerCase(internationalToANSI(sAux)));
   removeGarbage(sAux);
   if (sAux.compare("") != 0)
   { strcpy(inFile,sAux.c_str()); }
   else
   { inFile
[0]='\0'; }
  }
  else
  { inFile[0]='\0'; }
 }
 iTotalBytesInFile = sizeof(char)*sizeof(inFile);

 (…)

 //Create memory buffer on the device
 cl_mem memoryObjectInFile = clCreateBuffer(context,CL_MEM_READ_ONLY,iTotalBytesInFile,NULL,&ret);

 (…)

 //Copy char buffer to their respective memory buffer
 ret = clEnqueueWriteBuffer(command_queue,memoryObjectInFile,CL_TRUE,0,iTotalBytesInFile,&inFile,0,NULL,NULL);

 (…)

 //Set the arguments of the kernel
 ret = clSetKernelArg(kernel, 1, sizeof(memoryObjectInFile), (void *)&memoryObjectInFile);

 (…)

//My kernel code. Of course an embrionary version
__kernel void processTextFile(unsigned __global const int  *iTLA,
                                       __global const char* aInFile,
                                       __global const char* aClientList,
                              unsigned __global       int  *iResults)
{
 unsigned __local int iBlockStart,
                      iBlockEnd,
                      iCurrentBlock,
                      iBlockSize,
                      iTotalLinesInFile;
 iCurrentBlock = 0;
 iBlockSize = 128;
 iTotalLinesInFile = *iTLA;
 do
 {
  iBlockStart = (iCurrentBlock++)*iBlockSize;
  iBlockEnd   = iBlockStart+iBlockSize;
  if (iBlockEnd > iTotalLinesInFile-1)
  { iBlockEnd = iTotalLinesInFile-1; }
  //Copy block to a __local char array
  //Here is my problem. How can I do this?
  __local theBuffer = (…)
  //Process the text block
  processClients(iBlockSize,theBuffer,aClientList);
 } while (iBlockEnd<iTotalLinesInFile-1);
 (…)
 //Return an list of integers in iResults
 (…)
}

0 Likes

If I understand what you are asking you do the following

 

_local int localBuffer[256];

 

for (int i = 0; i < 256; i++)

  localBuffer = (int)(aInFile[someoffset+i]);

 

where someoffset is dependant on the workgroup/workitem

0 Likes