18 Replies Latest reply on Feb 22, 2012 2:14 PM by viscocoa

    local memory conflict?

    viscocoa

      Just experienced a bizarre problem. I can't past the entire code to confuse you. The program is like this conceptually:

       

      struct st_a{

          ...

          int3 vInA;

          ...

          int num;

      };

       

      struct st_b{

          ...

          float3 vInB;

          ...

      }

       

      __kernel void MyKernel()

      {

          __local struct st_a A;

          __local struct st_b B;

          A.vInA = (int3)0;

          B.vInB = (float3).6f;

          // OMG, A.vInA is now very big numbers!!!

          ...

      }

       

      I only launched one work item, so there is no write conflict.

       

      Does anybody have any suggestions?

       

      Thank you in advance!

       

      VisCocoa

        • Re: local memory conflict?
          jeff_golds

          Can you give the size of the structs st_a and st_b?  Maybe we can reproduce this in-house with that info.

            • Re: local memory conflict?
              viscocoa

              Hi Jeff,

               

              It's not a question on bank conflict.

               

              I know that when passing a structure from the host to a kernel, the structure must be padded.

               

              Now, it seems that a structure in local memory has to be padded to be the multiples of 16 bytes? Otherwise, the next structure will overlap the previous one?

               

              I have worked around the overwrite by adding a float4 in the end of the first structure:

               

              struct st_a{

                  ...

                  int3 vInA;

                  ...

                  int num;

                  int4 padding;

              };

               

              I don't think this comply with OpenCL standard.

               

              Thank you for you answer.

                • Re: local memory conflict?
                  jeff_golds

                  No, of course it's not complying with the standard.  I just want to get a test case that we can reproduce in-house. 

                  • Re: local memory conflict?
                    viscocoa

                    Only aligning the first structure does not solve the problem. However, if I keep the second variable far away from the first by declaring two local variables based on st_a, they don't conflict at all. To make things clear:

                     

                    struct st_a{

                        ...

                        int3 vInA;

                        ...

                        int num;

                    };

                     

                    struct st_b{

                        ...

                        float3 vInB;

                        ...

                    }

                     

                    __kernel void MyKernel()

                    {

                        __local struct st_a A;

                        __local struct st_a padding; // to keep B far away from A in local memory

                        __local struct st_b B;

                        A.vInA = (int3)0;

                        B.vInB = (float3).6f;

                        // Now A is intact

                        ...

                    }

                      • Re: local memory conflict?
                        jeff_golds

                        Can you give the size of st_a and st_b and the offset of vInA and vInB within each struct?  Thanks.

                        • Re: local memory conflict?
                          viscocoa
                          Below is a runable kernel that demonstrates the problem. I appreciate if you can confirm that the kernel reproduces the problem on your computer, so that I am confident that my software environment is all right.
                          typedef struct _C {
                              int m0;
                              float3 m1;
                              float m2;
                              float3 m3;
                          } st_C;
                          typedef struct _D {
                              float3 m0;
                              float3 m1;
                          } st_D;
                          typedef struct _A {
                              int m0;
                              st_C m1[10];
                              __global float3* m2;
                              int m3;
                              __global float3 *m4;
                              __global float2 *m5;
                              __global int4 *m6;
                              int m7;
                              st_D m8;
                              int3 m9; // focus on this
                              __global int2 *m10;
                              __global int *m11;
                              int m12;
                          } st_A;

                          typedef struct _B {
                              int m0;
                              float3 m1; // focus on this
                              float3 m2;
                          } st_B;

                          __kernel void experiment()
                          {
                              __local st_A foo;
                              __local st_B bar;
                              foo.m9 = (int3)0;
                              printf("foo.m9: (%d, %d, %d)\n", foo.m9.x, foo.m9.y, foo.m9.z);
                              // output: (0, 0, 0)
                              bar.m1 = (float3).6f;
                              printf("foo.m9: (%d, %d, %d)\n", foo.m9.x, foo.m9.y, foo.m9.z);
                              // output: (1058642330, 1058642330, 1058642330)
                          }
                            • Re: local memory conflict?
                              jeff_golds

                              Thanks, that helped.  I am able to see the issue here.  CPU device works fine, so it's probably a real bug on our side.

                              1 of 1 people found this helpful
                                • Re: local memory conflict?
                                  viscocoa

                                  Thank you Jeff!

                                    • Re: local memory conflict?
                                      viscocoa

                                      On my computer:

                                       

                                      &foo = 0

                                      sizeof(stA) = 752

                                      &bar = 704

                                       

                                      There are 48 bytes overlapping.

                                        • Re: local memory conflict?
                                          jeff_golds

                                          Yep, seems to be something going on with the alignment of the pointers within the struct.  For a workaround, sort your structs from small to large. I.e.

                                              __local st_B bar;

                                              __local st_A foo;

                                           

                                          That worked for me and Micah, who is one of our compiler engineers who posts here regularly, thinks that is a safe workaround until he gets the bug(s) sorted out.

                                           

                                          Thanks again for reporting the issue.

                                          1 of 1 people found this helpful
                                            • Re: local memory conflict?
                                              viscocoa

                                              I actually used:

                                               

                                              __local st_A foo;

                                              __local char padding[48];

                                              __local st_C bar;

                                               

                                              In my program, there are other structures following the two.

                                               

                                              Thank you and have a good night!

                                              • Re: local memory conflict?
                                                jeff_golds

                                                Ack, I just found that while foo may come out okay, bar isn't correct, at least according to printf.  I changed the code as follows:

                                                    foo.m9 = (int3)0;

                                                    bar.m1 = (float3).6f;

                                                    barrier(CLK_LOCAL_MEM_FENCE);

                                                    printf("bar.m1: (%f, %f, %f)\n", bar.m1.x, bar.m1.y, bar.m1.z);

                                                    // output: (0.600000, 0.600000, 0.600000)

                                                    out[0] = foo.m9.x;

                                                 

                                                CPU device:

                                                bar.m1: (0.600000, 0.600000, 0.600000)

                                                GPU device:

                                                bar.m1: (0.600000, 0.000000, 0.000000)

                                                 

                                                Hopefully we can get a proper fix soon.  I didn't check whether your padding would fix this case, just whether reordering the structs would.

                                                  • Re: local memory conflict?
                                                    viscocoa

                                                    Thanks Jeff. The padding worked well. I would suggest, for now, that all variables declared in local memory should be checked for possible overlaps.

                                                      • Re: local memory conflict?
                                                        MicahVillmow

                                                        viscocoa,

                                                        I've found the problem. The reason this is occurring is that the code in question was written a long time ago, before OpenCL 1.1 and vec3 existed. In OpenCL, vec3 types actually store the same amount of memory as a vec4 type and this section of code wasn't taking that into account, so the offsets into the structure were based on the wrong size. The only work-around that is guaranteed to work is to use vec4 types instead of vec3 types. This will provide the correct size calculation and will fix all the issues.

                                                         

                                                        Expect this fix to make it into catalyst in the April or May driver.