4 Replies Latest reply on Jul 11, 2011 12:52 PM by tonyo_au

    char array manipulation

    Sadikuz

      Hi, all.

       

      I'm a complete newbie in OpenCL and I need some help in a piece of code I'm working for practice. What I need to do is load an entire text file (~19M) in device's global memory (GPU), copy blocks of 32 or 64 (this value can change) lines to local memory and then search for a specifics therms (a customer list, with ~14000 names) in that block. So I'll have a lot threads, each one looking for a specific name. My main problem is that I don't know how to copy my text file (which is actually stored in a 2D char array in C) to device's memory. Every time I have acess restricted to the 1st element of array. Can you give some advice on how can I do this?

        • char array manipulation
          tonyo_au

          One way would create a 2D integer texture, copy the data to the image and then access it as a texture.

          Another is to create a buffer (which is a 1D array), copy the names to it with a stride that is as wide as the longest name. Now each name is 1 stride apart in the buffer.

          A third options is to have 2 buffers - one is a 1D array with the names tightly packed and an index array. This may be more efficient on memory but I don't whether it is as good preformance wise.

          From what I have read data in multiples of 16 is best. Someone may be able to common more on the efficency of these methods

            • char array manipulation
              rick.weber

              The first method will definitely allow you to allocate enough memory, but you might have to get clever with indexing if you have more than 8192 strings (the max height of an image).

              The second option is probably the best since you'll need arbitrary byte addressing, which I don't think you can get with a texture. Unfortunately, if I recall, you can only allocate 128MB in a single go with AMD GPUs. This may cause problems if the 19MB of data has a large disparity between the max string length and the average string length; if the max string is 7-8x longer than the average string, you'll roughly 7-8x increase your memory needs blowing your 128MB budget.

              The third option will be more of a hassel than a performance hit, since GPUs can perform scan operations fairly efficiently and it sounds like you'll get tons of data reuse when you search the customer list against your text file. If option 2 doesn't work, option 3 definitely will.

              In any case, the compiler is going to generate a bunch of shift and mask operations on AMD GPUs since local memory uses word addressing, so you may want to cast your strings to ints after loading them from global memory in bulk and store them in a local memory integer array to avoid this.

                • char array manipulation
                  Sadikuz

                  Thank you guys for your attention. After your explanations, everything is pretty clear. I rewrote my code trying to follow your suggestions, but I still can't access more than 1 line of my text file when I store the data on device's memory. I'll attach the relevant part of my algorithm at the end of this message, so you'll be able to read it and tell me what I'm doing wrong.

                  []s

                  Walid



                   //At this point, I have the entire text file loaded
                   //in a C++ string vector (slInFile), and I know the
                   //max string length (iTMS: no more than 100 chars).
                   unsigned  int  iTotalBytesInFile;
                             char inFile[slInFile.size()][iTMS];
                   std::string sAux("");
                   //Copy the entire file to char buffer
                   for (unsigned int i=0; i<slInFile.size(); i++)
                   {
                    sAux.assign(slInFile.at(i));
                    trim(sAux);
                    if (sAux.compare("") != 0)
                    {
                     sAux.assign(toLowerCase(internationalToANSI(sAux)));
                     removeGarbage(sAux);
                     if (sAux.compare("") != 0)
                     { strcpy(inFile,sAux.c_str()); }
                     else
                     { inFile
                  [0]='\0'; }
                    }
                    else
                    { inFile[0]='\0'; }
                   }
                   iTotalBytesInFile = sizeof(char)*sizeof(inFile);

                   (…)

                   //Create memory buffer on the device
                   cl_mem memoryObjectInFile = clCreateBuffer(context,CL_MEM_READ_ONLY,iTotalBytesInFile,NULL,&ret);

                   (…)

                   //Copy char buffer to their respective memory buffer
                   ret = clEnqueueWriteBuffer(command_queue,memoryObjectInFile,CL_TRUE,0,iTotalBytesInFile,&inFile,0,NULL,NULL);

                   (…)

                   //Set the arguments of the kernel
                   ret = clSetKernelArg(kernel, 1, sizeof(memoryObjectInFile), (void *)&memoryObjectInFile);

                   (…)

                  //My kernel code. Of course an embrionary version
                  __kernel void processTextFile(unsigned __global const int  *iTLA,
                                                         __global const char* aInFile,
                                                         __global const char* aClientList,
                                                unsigned __global       int  *iResults)
                  {
                   unsigned __local int iBlockStart,
                                        iBlockEnd,
                                        iCurrentBlock,
                                        iBlockSize,
                                        iTotalLinesInFile;
                   iCurrentBlock = 0;
                   iBlockSize = 128;
                   iTotalLinesInFile = *iTLA;
                   do
                   {
                    iBlockStart = (iCurrentBlock++)*iBlockSize;
                    iBlockEnd   = iBlockStart+iBlockSize;
                    if (iBlockEnd > iTotalLinesInFile-1)
                    { iBlockEnd = iTotalLinesInFile-1; }
                    //Copy block to a __local char array
                    //Here is my problem. How can I do this?
                    __local theBuffer = (…)
                    //Process the text block
                    processClients(iBlockSize,theBuffer,aClientList);
                   } while (iBlockEnd<iTotalLinesInFile-1);
                   (…)
                   //Return an list of integers in iResults
                   (…)
                  }

                    • char array manipulation
                      tonyo_au

                      If I understand what you are asking you do the following

                       

                      _local int localBuffer[256];

                       

                      for (int i = 0; i < 256; i++)

                        localBuffer = (int)(aInFile[someoffset+i]);

                       

                      where someoffset is dependant on the workgroup/workitem