5 Replies Latest reply on Nov 19, 2009 6:00 PM by omkaranathan

    #include in kernel and structure alignment

    nou

      i found a great feature which isnt make it clear in specification. in section 5.4.3.1 there are option -I. i call clBuildProgram(.., "-I.", ); and in kernel have #include "header.cl". and it work.

      another thing. in release notes is stated that every element of structure must be aligned to float4. but i write struct and i get sizeof() that structure is 324 byte. which is exactly the sum size of data types in the struct. i tested it only on CPU. is on GPU some restriction in alignment?

      second thing in release note is that array in struct is not supported. but for me it worked.

        • #include in kernel and structure alignment
          omkaranathan

           

          Originally posted by: nou

          another thing. in release notes is stated that every element of structure must be aligned to float4. but i write struct and i get sizeof() that structure is 324 byte. which is exactly the sum size of data types in the struct. i tested it only on CPU. is on GPU some restriction in alignment?

           

          As release notes say struct packing (and alignment) is the user's responsibility.



           

          • #include in kernel and structure alignment
            AndreasStahl

             

            Originally posted by: nou i found a great feature which isnt make it clear in specification. in section 5.4.3.1 there are option -I. i call clBuildProgram(.., "-I.", ); and in kernel have #include "header.cl". and it work.


            That is awesome, I didn't know that, thanks!

             

             

            Originally posted by: nou another thing. in release notes is stated that every element of structure must be aligned to float4. but i write struct and i get sizeof() that structure is 324 byte. which is exactly the sum size of data types in the struct. i tested it only on CPU. is on GPU some restriction in alignment?


            The way I see it, you have to make sure that when using for example float4, you have to take care of it lying on a  N*16 Byte address.

            lets say we have a struct of three single floats, A, B, one float4 C and another float D.

            The Memory for one instance is then (Memory offset in decimal notation)

            00 A

            04 B

            08 C

            12 C

            16 C

            20 C

            24 D

            sizeof() will tell you 28 Bytes, which is correct and expected, but OpenCL will not be able to access C at adress 8, because it can not load from there since it is not aligned to 16 Bytes. This is why you need to introduce 8 bytes of padding between B and C.

             

            00 A

            04 B

            08 P

            12 P

            16 C

            20 C

            24 C

            28 C

            32 D

            But now, when two structs are in an array, the alignment is shifted again!



             

             

            00 A

            04 B

            08 P

            12 P

            16 C

            20 C

            24 C

            28 C

            32 D

             

            36 A

            40 B

            44 P

            48 P

            52 C

            56 C

            60 C

            64 C

            68 D





             

            OpenCL will not be able to access [1].C either, as it is on address 52, and thus not on the alignment boundary. So you need to add yet another 12 Bytes of padding at the end of the struct (in this example.. in reality you would just exchange D and C and remove one of the paddings)

             

            00 A

            04 B

            08 P

            12 P

            16 C

            20 C

            24 C

            28 C

            32 D

            36 P

            40 P

            44 P

            sizeof is now 48 Bytes, which is a multiple of the largest required alignment 16, so all alignment issues should now be resolved



             

            00 A

            04 B

            08 P

            12 P

            16 C

            20 C

            24 C

            28 C

            32 D

            36 P

            40 P

            44 P

             

            48 A

            52 B

            56 P

            60 P

            64 C

            68 C

            72 C

            76 C

            80 D

            84 P

            88 P

            92 P

            et voila: [1].C lies on address 64 – perfectly adressable by openCL!









            struct { cl_float A; cl_float B; cl_float4 C; cl_float D; }; // with one padding struct { cl_float A; cl_float B; cl_char[8] P; cl_float4 C; cl_float D; }; // fully padded struct { cl_float A; cl_float B; cl_char[8] Padding; cl_float4 C; cl_float D; cl_char[12] morePadding; };

              • #include in kernel and structure alignment
                nou

                now it make sense to me. what is staggeringly me that sizeof () returns 324 bytes of the kernel kode, not a host code. I will have to look in more detail and try read from unaligned struct. in the spec is stated tha OpenCL compiler is responsible for aligning.

                  • #include in kernel and structure alignment
                    nou

                    i make some experiments with struct.in my first struct i use only float not vector type.

                    and another thing. is CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE maximum of size vector types in OpenCL?

                    //my first struct typedef struct { int a;//this has offset 0 int2 b;//this has 2 int4 c;//4 int8 d;//8 } test_struct; typedef struct { int a; int2 b; int2 e;//insert this one int4 c;//move this on offset 8 int8 d;//and this even on offset 16 } test_struct; typedef struct { int a; int2 b; int2 e; int4 c; int8 d; int f;//inserting this one growth sizeof() from 96 to 128 because even sizeof() must by multiple of biggest vector type } test_struct;

                      • #include in kernel and structure alignment
                        omkaranathan

                        As per OpenCL spec 6.10.1

                        The alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question and must also be a power of two. Which explains the growth of size from 96 to 128.