5 Replies Latest reply on Mar 14, 2013 12:26 AM by himanshu.gautam

    Device Fission/Partition

    cconti

      Dear All,

       

      I'm trying to use the device partition to evaluate OpenCL as an alternative to OpenMP with vectorization on NUMA architectures.

      However I'm unable to use partitioning by affinity with NUMA affinity.

       

      I'm working mainly on two types of cluster nodes:

      - 4P Magny-Cours with AMD APP 2.8 (OpenCL 1.2)

      - 1P Interlagos (Cray XK7 node) with AMD APP 2.5 (OpenCL 1.1 - device fission extension)

       

      the query for affinity domains available only gives me L1, L2, L3 and next, and no NUMA affinity (that I would require).

      Is there some special requirement to have the partitioning by NUMA affinity or is it just not supported yet?

        • Re: Device Fission/Partition
          himanshu.gautam

          hi,

          I will try checking that out from AMD Engineers, whether NUMA affinity is supported. Meanwhile is it possible for you to share a small code snippet which can show the issue.

            • Re: Device Fission/Partition
              cconti

              Hi,

               

              I'm using the cl.hpp header from the AMD APP 2.8 on the Magny-Cours (on the Cray XK7 I use the extension of OpenCL 1.1, in the same manner, just with the different naming)

               

              here's a simplified snippet I use to check the properties available.

              As mentioned before, the output of this, is that partitioning by L1, L2, L3 are supported, as well as "next partitionable" (the value for affinity is 60).

               

              cl_device_affinity_domain affinity = device.getInfo<CL_DEVICE_PARTITION_AFFINITY_DOMAIN>();

               

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_NUMA)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_NUMA partitioning properties\n";

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_L1_CACHE partitioning properties\n";

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_L2_CACHE partitioning properties\n";

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_L3_CACHE partitioning properties\n";

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_L4_CACHE partitioning properties\n";

              if (affinity & CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE)

              cout << "CL_DEVICE_AFFINITY_DOMAIN_NEXT_PARTITIONABLE partitioning properties\n";

              if (affinity == 0)

                cout << "no affinity domains supported\n";

                • Re: Device Fission/Partition
                  himanshu.gautam

                  Hi cconti,

                  NUMA Affinity is disabled in runtime by default.

                   

                  Anyways can you share your motive of using it, specifically.

                  1 of 1 people found this helpful
                    • Re: Device Fission/Partition
                      cconti

                      Hi,

                       

                      would it be possible to activate it in some way?

                       

                      I'm trying to evaluate OpenCL on CPU architectures.

                      I tried once when OpenCL 1.0 came out with a GEMM implementation and compared against one with SSE (same code structure, with the addition that the SSE implementation implemented the NUMA first touch policy).

                      At the time I observed that on architectures with single NUMA nodes, OpenCL was able to outperform our SSE implementation whereas with a NUMA architecture, SSE was better.

                       

                      My main focus is to port a high performance compressible flow solver to GPUs with OpenCL, but I think it would be interesting to take this chance to see if with OpenCL I can develop something that can compete with a code optimized for NUMA architecture and with SSE/AVX instructions.

                      The evaluation of OpenCL on CPU might become important in a later stage when I would work on a fully heterogeneous code.