2 Replies Latest reply on Aug 17, 2017 3:28 AM by dipak

    Buffers with CL_MEM_READ_ONLY | CL_MEM_ALLOCATE_HOST_PTR always have write-combined host allocation

    bcaf01

      Dear fellow developers,

       

       

      it seems that when creating an OpenCL buffer and specifying both CL_MEM_READ_ONLY and CL_MEM_ALLOC_HOST_PTR will result in the AMD platform allocating write-combined host memory. A simple example to reproduce this behavior is posted below. (I am using a Radeon Pro WX 5100, Windows 10 (64-bit, but I am compiling the example as a 32-bit application) and the latest Radeon Pro driver.)

       

      One thing that is rather curious is that when not passing CL_MEM_ALLOC_HOST_PTR but calling the map-command directly instead, the host allocation made available by the runtime is not allocated as write-combined (cf. the output generated by the program.)

       

      #define __CL_ENABLE_EXCEPTIONS
      
      // C++ includes
      #include <iostream>
      #include <string>
      #include <vector>
      
      // Windows API
      #include <Windows.h>
      
      // OpenCL includes
      #include <CL/cl.hpp>
      
      
      int main( void ) {
          try {
              std::vector< cl::Platform > platforms;
              std::vector< cl::Device > devices;
      
              // Platform selection
              cl::Platform::get( &platforms );
              const cl::Platform &platform = platforms[ 0 ];
      
              // Device selection
              platform.getDevices( CL_DEVICE_TYPE_GPU, &devices );
              const cl::Device &device = devices[ 0 ];
      
              // Print platform information
              std::string name;
              std::string version;
              platform.getInfo( CL_PLATFORM_NAME, &name );
              platform.getInfo( CL_PLATFORM_VERSION, &version );
              std::cout << "(Using the platform " << name << " at version " << version << ")" << std::endl;
      
              cl_context_properties props[ 3 ] = { CL_CONTEXT_PLATFORM, (cl_context_properties) (platform) (), 0 };
              cl::Context ctx( device, props );
              cl::CommandQueue queue( ctx, device );
      
              size_t bufferSize = 2048 * 1024 * sizeof( float );
      
              {
                  cl::Buffer buffer = cl::Buffer( ctx, CL_MEM_READ_ONLY, bufferSize );
      
                  float *bufferHost = static_cast<float*>(queue.enqueueMapBuffer( buffer, CL_TRUE, CL_MAP_READ, 0, bufferSize ));
      
                  MEMORY_BASIC_INFORMATION memInfo;
                  if ( VirtualQuery( reinterpret_cast<void*>(bufferHost), &memInfo, sizeof( memInfo ) ) )
                  {
                      std::cout << "Host allocation as write-combined: " << ((memInfo.AllocationProtect & PAGE_WRITECOMBINE) ? "Yes" : "No") << std::endl;
                      std::cout << "Host memory is write-combined: " << ((memInfo.Protect & PAGE_WRITECOMBINE) ? "Yes" : "No") << std::endl;
                  }
      
                  queue.enqueueUnmapMemObject( buffer, bufferHost );
              }
              {
                  cl::Buffer buffer = cl::Buffer( ctx, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, bufferSize );
      
                  float *bufferHost = static_cast<float*>(queue.enqueueMapBuffer( buffer, CL_TRUE, CL_MAP_READ, 0, bufferSize ));
      
                  MEMORY_BASIC_INFORMATION memInfo;
                  if ( VirtualQuery( reinterpret_cast<void*>(bufferHost), &memInfo, sizeof( memInfo ) ) )
                  {
                      std::cout << "Host allocation as write-combined: " << ((memInfo.AllocationProtect & PAGE_WRITECOMBINE) ? "Yes" : "No") << std::endl;
                      std::cout << "Host memory is write-combined: " << ((memInfo.Protect & PAGE_WRITECOMBINE) ? "Yes" : "No") << std::endl;
                  }
      
                  queue.enqueueUnmapMemObject( buffer, bufferHost );
              }
      
              queue.finish();
          } catch ( cl::Error &error ) {
              std::cerr << "OpenCL C++ API Exception during " << error.what() << ": " << error.err() << std::endl;
          }
      
          return 0;
      }
      

       

      I would like to argue that automatically allocating host memory associated with a CL_MEM_READ_ONLY | CL_MEM_ALLOCATE_HOST_PTR buffer as write-combined is not a good idea and indeed believe that this should be classified as a bug. You could have, e.g. one thread filling a buffer, using that buffer during a computation and have another thread reading that buffer in order to save it to a log file. When allocating as write-combined, this reading of the buffer will take a long time (up to 26 times slower on my system). Instead I would like to suggest that the runtime should only allocate the host memory as write-combined if, in addition, CL_MEM_HOST_WRITE_ONLY is specified (as is kind of suggested by the OpenCL specification).

       

      Any comments on this observation? Thanks in advance for your replies.

       

       

      Kind regards

      bcaf01

       

      PS: I would appreciate it if someone could add me to the white-list and move this topic to the appropriate developer forum!

       

      Changed the title to give a better description of the issue.

        • Re: Strange behavior when allocating an OpenCL buffer using CL_MEM_READ_ONLY
          dipak

          You've been whitelisted now.

           

          Regards,

          • Re: Buffers with CL_MEM_READ_ONLY | CL_MEM_ALLOCATE_HOST_PTR always have write-combined host allocation
            dipak

            Hi,

            Usually, buffers created with CL_MEM_READ_ONLY | CL_MEM_ALLOCATE_HOST_PTR indicates that the programmer wants to create a pre-pinned (zero-copy) buffer to pass the data from host to kernel. Host will write the data that will be read by the kernel. Because it's read-only at kernel side, there is little sense to read the buffer once again at host side. It's one directional in general.

            Also note that, CL_MEM_HOST_WRITE_ONLY was added later into the spec.

             

            One thing that is rather curious is that when not passing CL_MEM_ALLOC_HOST_PTR but calling the map-command directly instead, the host allocation made available by the runtime is not allocated as write-combined (cf. the output generated by the program.)

            Not passing CL_MEM_ALLOC_HOST_PTR makes it as regular device buffer. Mapping it in read-only mode indicates that the host only wants to read the buffer, not write. So, it's not same as allocating a pre-pinned buffer with CL_MEM_READ_ONLY | CL_MEM_ALLOCATE_HOST_PTR.

             

            Regards,