10 Replies Latest reply on Jan 6, 2013 11:28 PM by heman

    Little fraction of allocated buffer correct initialized?

    cguenther

      Hi there.

       

      I heard of the situation that the OpenCL Buffers can only be allocated in appropriate chunks:

      http://devgurus.amd.com/thread/158397

      http://devgurus.amd.com/message/1282922#1282922

       

      I am using the Cpp Bindings from the Khronos Group with the compiler option "DCL_USE_DEPRECATED_OPENCL_1_1_APIS" to achieve correct linking. The Buffer is allocated as follows:

       

      cl::Buffer tempBuffer(m_context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,size,VBOpos,&errret);


      This Buffer contains 3D positions of points as 3 integers per point and is about 120mb. (10 mio points) This command returns cl_success. Can i assume that this behaviour indicates that the buffer is small enough to have not the problem of the 2 links above?

       

      My Kernel reads the data as follows:

       

      __kernel

      void testKernel(__global int* VBOpos)

      {

      gid = get_global_id(0);

      __private int4 WorldSpacePointPosInt;

      WorldSpacePointPosInt = vload4(0,&(VBOpos[3*gid]));

      }

       

      The values of the about first 8mio Points are correct. But after that I get only nonsense. It seems that the allocation is done correctly, but the informations are not settet correctly. I tried it also bevorhand with an shared GL VBO. The GL rendering of the VBO is correct. But when i read it with CL from the shared GL VBO also there are only about 8mio correct points and the rest is dirt.

       

      My Nvidia Notebook does not has this problem.

       

      I would like to chunk the date into several buffers, but I do not know which exact size of the buffer is definitively correct initialized.

       

      Testsystem:

      AMD 7970, Win7 X64, Ubuntu X64 12.04, catalyst 12.8, AMD APP SDK 2.7

       

      Sorry for my bad english and please reply, this would help me a lot.

        • Re: Little fraction of allocated buffer correct initialized?
          binying

          Can i assume that this behaviour indicates that the buffer is small enough to have not the problem of the 2 links above?

          ---well, I think you can use the printf extension to confirm/double check.

          Hope this answer will be helpful to you.

            • Re: Little fraction of allocated buffer correct initialized?
              yurtesen

              In the past if I exceeded the buffer limits, I got an error from allocation. You should be able to see maximum object size from clinfo output. Why dont you try it with vload3 ? and see if you are getting same values or not? also you can perhaps try on CPU?

              1 of 1 people found this helpful
                • Re: Little fraction of allocated buffer correct initialized?
                  cguenther

                  The vload3 gives the same situation. I also tried to access the data with the [] operator, with the same result. I tried to get the CL_DEVICE_MAX_MEM_ALLOC_SIZE with the following code but it return only two binary zeros...

                   

                      cl_platform_id platformId;

                      cl_device_id deviceId;

                      clGetPlatformIDs(1,&platformId,0);

                      clGetDeviceIDs(platformId,CL_DEVICE_TYPE_GPU,1,&deviceId,0);

                      DumpDetailedCLError(clGetDeviceInfo(deviceId,CL_DEVICE_MAX_MEM_ALLOC_SIZE, 0,0,&size), false);

                      char deviceInfo[size];

                      DumpDetailedCLError(clGetDeviceInfo(deviceId,CL_DEVICE_MAX_MEM_ALLOC_SIZE, 500, &deviceInfo,0), false);

                      printf("%s\n", deviceInfo);

                      fflush(stdout);

                   

                  The one pro says to the AMD 7970:

                  Currently OpenCL users are limited to 25% of device memory,

                   

                  I don't know where you get this from, perhaps it's a rumor, but it's certainly not correct.

                  (there is a 512MB limit per allocation call but you can allocate as much as you like)

                   

                  I do predominately scientific computing and often need very large and fast memory so I am mostly using the 7970. On the 7970, I often allocate a single contiguous buffer that uses just shy of 3GB, the device limit. It's very simple, all you do is allocate in chunks of 512MB or less and make sure the chunks are rounded to about 0x4000 bytes, then they will be placed contiguously.

                  So i don't think that it is a problem with allocating, because i have only a fraction of this buffer sizes at the same hardware. It seems rather that it is a problem while reading in this address space. This appears with OpenCL buffer AND with a shared GL VBO.

                   

                  So i now try to divide into chunks with 8mio points, but i don't know at which position exactly the readings are faulty and why it is so much earlier than the 512MB border.

                   

                  Thanks for your response so far.

                • Re: Little fraction of allocated buffer correct initialized?
                  cguenther

                  The printf puts also the expected faulty values. But checking with enqueueWriteBuffer and after it enqueueReadBuffer, i  have determined that the value are correct written and readed compared to the host source.

                   

                  So i thought that the problem is not the faulty values inside the memory, but rather the method i used to read. So i determined that global IDs with values above 8mio are faulty, wich i used to access the memory.

                   

                  Now i shrinked my global ID work space and do looping my kernel to do overall the same work. This is now the solution of my Problem.

                   

                  Does any one know how big the maximum global ID can be?