34 Replies Latest reply on Jul 18, 2011 11:18 PM by luluxiu

    The RANLUX pseudorandom number generator implemented in OpenCL (updated)

    dravisher
      Easy to use (copy-paste) code for anyone interested

      Edit march 2011: I've updated the code quite a bit. The generator now works correctly with SDK 2.3 and SDK 2.4 rc1, and it should be much simpler to use. Newest version can be found on bitbucket, just grab the newest zip-file. As before there's also a program that can verify that the generator is producing correct sequences, measures performance and shows how to use the generator.

       

      Hello. This is a piece of code I've developed for my own use, because I couldn't really find any good pseudorandom number generators for OpenCL that were simple to implement and understand. I'm sharing it so that others who just want a good PRNG can use it, without having to spend time porting a generator themselves. The generator is meant to be used with one copy per work-item. State data is kept in 7 float4 vectors.

      The algorithm was developed by Martin Lüscher, while the Fortran 77 implementation this code is based on was developed by Fred James. The two relevant papers are:

       Martin Lüscher, A portable high-quality random number generator for lattice field theory simulations, Computer Physics Communications 79 (1994) 100-110


      F. James, RANLUX: A Fortran implementation of the high-quality pseudorandom number generator of Lüscher, Computer Physics Communications 79 (1994) 111-114

      The code is licensed with an MIT license. If you really do find it usefull or interesting I'd love to hear about it :-). Also, if anyone thinks there is a problem with it please let me know, as I'm planning to make use of it myself for Monte Carlo simulations.

      Usage of the generator is described as comments in the code, and I think the example code should show pretty clearly how to use the generator. The example code can also check the correctness of the implementation agains values generated by the original Fortran 77 code.

      To see a list of the arguments just run prngtest.exe. To check performance and correctness of a GPU at luxury level 4 you'd run "prngtest.exe 4 1".

      About RANLUX
      RANLUX is an "old" (proposed in 1994) PRNG. It has been used extensively, and to the best of my knowledge no defects have been found at luxury setting 2 and above (which perfectly lines up with the inventors predictions).

       The generator has a few notable features:


      All calculations are in 32-bit floating point. The algorithm generates a number between 0 and 1 directly.

      It is possible to select a "luxury level", where higher luxuries mean better but slower numbers. Luxury levels 0-4 are presented here, 4 being best.

      RANLUX is also one of very few (the only one I know of actually) PRNGs that has a underlying theory explaining why it generates "random" numbers, and why they are good. Indeed, if the theory is correct (and I don't know of anyone who has disputed it), RANLUX at the highest luxury level produces completely decorrelated numbers down to the last bit, with no long-range correlations as long as we stay well below the period (10^171). Most other generators can say very little about their quality (like Mersenne Twister, KISS etc.) They must rely on passing statistical tests.

      Performance
      At luxury level 0, more than 20*10^9 numbers per second on a Cypress. At luxury level 4, more than 4*10^9 numbers per second.

       

        • The RANLUX pseudorandom number generator implemented in OpenCL
          drstrip

          While this is a useful and generous contribution, a word of caution: Using ostensibly independent RNGs for each element (work item) in a massively parallel computation is frowned on in the MP community, as it creates the opportunity for very hard to detect correlations between streams of RNGs. Using fewer generators is much preferred.

           

          If one makes some assumptions about the architecture of the target GPU and how threads are assigned and processed, one can reduce the number of RNGs. (Ideally, we should have an OpenCL call which tells us which SIMD we're on.) If the code surrounding utilizing the RNG has constant execution time across all threads and you assume round-robin scheduling (which I've been told is currently true for AMD OpenCL), then you can determine which SIMD you're on. In this case you can reduce the number of RNGs to wavefront size * number of SIMDs and be assured of no read/write inconsistencies. For problems with very large numbers of work items (my case was millions), this is a huge reduction in the size of the state space, as well as better practice in terms of using shared RNGs.

            • The RANLUX pseudorandom number generator implemented in OpenCL
              dravisher

              Hello and thanks for the comment drstrip :-)

              I do know that PRNGs in parallel still remains a bit of a problem even today. Indeed as far as I know there are no PRNGs that can actually guarantee independent sequences in parallel. For instance the usual method for Mersenne Twister, called Dynamic Creator for Mersenne Twister (DCMT) can not be proven to result in uncorrelated parallel sequences, it is only considered likely that it does.

              I haven't really been able to find any solid documents expaining why RANLUX is safe in parallel, however I believe it stems from the underlying theory (that the generator is connected to a classical chaotic system). As such RANLUX is probably the generator which has the stronges evidence for being safe in parallel applications. If I understand it correctly, there should be no correlations unless we have overlapping sequences (which is a practical impossibility considering the period of 10^171). The other usual suspects (like Mersenne Twister) has no similar theory, so it is much more of a black art in those cases.

              While the approach you mention would certainly be more elegant if performance is maintaned, it really is too bad that it can't be done in a portable manner.

            • The RANLUX pseudorandom number generator implemented in OpenCL
              joohongyee

              I have tried to compile this file using below command

              g++ -framework opencl -o prog.exe PRNGTest_1.9.cpp 

              However, prog.exe 1 1 1 1 1 showed me an error message as below.

              Can you tell me how can i compile this code without error?

              =============================================

               

              s1f1:test 1294566$ ./prog.exe 0 1 1 1 1 

              PRNGTest 1.9

              Using GPU

              RANLUX illegal initialization seed: 0. RANLUX initialized using default seed instead: 314159265

              Num platforms:       1

              Platform Vendor:     Apple

              Platform profile:    FULL_PROFILE

              Platform version:    OpenCL 1.0 (Apr  7 2010 19:04:28)

              Platform name:       Apple

              Platform extensions: 

              Device name:         GeForce 9400

              Device type:         GPU 

              Device vendor ID:    16918016

              Max compute units:   2

              Max work item dim:   3

              Max work group size: 512

              Max mem alloc size:  128 MiB

              Global memory size:  256 MiB

              Global memory cache: None

              Local memory type:   Dedicated

              Prof Timer Res:      1000 ns

              OpenCL Driver ver:   CLH 1.0

              Building OpenCL program: Done

               

              Build log: 

              :1:10: fatal error: 'ranlux_unified_1.0.0.cl' file not found

               #include "ranlux_unified_1.0.0.cl"

                        ^

              ERROR (-45): Kernel::Kernel(): Invalid program executable

              ==============================================

               

                • The RANLUX pseudorandom number generator implemented in OpenCL
                  dravisher

                  Seems it's not finding ranlux_unified_1.0.0.cl. The line:

                  BuildOptions += "-I . "; //Search for include files in current directory

                  In the .cpp file should tell the OpenCL implementation to look for it in the working directory, are you sure it's there?

                  You could also just paste the entire contents of the ranlux_unified_1.0.0.cl file into the PRNGTest_kernels_1.9.cl file, since that's all the include statement is supposed to be doing anyway.

                    • The RANLUX pseudorandom number generator implemented in OpenCL
                      joohongyee

                      I have downloaded cl.hpp file from "http://www.khronos.org/registry/cl/api/1.1/cl.hpp"

                      then stored in the same directory with 3 downloaded files.

                       

                      First I have compile with -I. option like below.

                       g++ -framework opencl -I. -o test PRNGTest_1.9.cpp

                      However, it showed the same error as above. 

                       

                      Second, as you have indicated I have included head file in PRNGTest_1.9.cpp.

                      #include "cl.hpp"

                      Actually, it was already included before getting above error sign.

                       

                      Third, I didn't include "#include "ranlux_unified_1.0.0.cl"" then copied in PRNGTest_kernels._1.9.cl then compiled like above " g++ -framework opencl -I. -o test PRNGTest_1.9.cpp"

                      Then i got an error message as below.

                      ============================================

                       

                      ipswitch:test innovationspace$ g++ -framework opencl -I. -o test PRNGTest_1.9.cpp

                      ipswitch:test innovationspace$ ./test 0 1 1 1 1 

                      PRNGTest 1.9

                      Using GPU

                      RANLUX illegal initialization seed: 0. RANLUX initialized using default seed instead: 314159265

                      Num platforms:       1

                      Platform Vendor:     Apple

                      Platform profile:    FULL_PROFILE

                      Platform version:    OpenCL 1.0 (Apr  7 2010 19:04:28)

                      Platform name:       Apple

                      Platform extensions: 

                      Device name:         GeForce 9400

                      Device type:         GPU 

                      Device vendor ID:    16918016

                      Max compute units:   2

                      Max work item dim:   3

                      Max work group size: 512

                      Max mem alloc size:  128 MiB

                      Global memory size:  256 MiB

                      Global memory cache: None

                      Local memory type:   Dedicated

                      Prof Timer Res:      1000 ns

                      OpenCL Driver ver:   CLH 1.0

                      Building OpenCL program: Done

                       

                      Build log: 

                      :124:6: error: conflicting types for 'ranlux_download_seed'

                       void ranlux_download_seed(      float4 *s01to04, 

                            ^

                      :14:2: note: previous implicit declaration is here

                               ranlux_download_seed(&s01to04, &s05to08, &s09to12, &s13to16, &s17to20, &s21to24, &in24, &stepnr, &carry, PRNGTab, gid, global_size);

                               ^

                      :153:6: error: conflicting types for 'ranlux_upload_seed'

                       void ranlux_upload_seed(float4 *s01to04, 

                            ^

                      :44:2: note: previous implicit declaration is here

                               ranlux_upload_seed(&s01to04, &s05to08, &s09to12, &s13to16, &s17to20, &s21to24, &in24, &stepnr, &carry, PRNGTab, gid, global_size);

                               ^

                      :239:8: error: conflicting types for 'ranlux'

                       float4 ranlux(float4 *s01to04, 

                              ^

                      :22:14: note: previous implicit declaration is here

                                       randomnr = ranlux(&s01to04, &s05to08, &s09to12, &s13to16, &s17to20, &s21to24, &in24, &stepnr, &carry);

                                                  ^

                      ERROR (-45): Kernel::Kernel(): Invalid program executable

                      ipswitch:test innovationspace$ 

                      ==============================================
                      I am running this program on Mac computer. That's is why i am using command like compiling method. Thanks for your quick reply for the posting. 
                        • The RANLUX pseudorandom number generator implemented in OpenCL
                          dravisher

                          I am not familiar with Mac, but it seems the problem is only with the OpenCL (.cl) files, since the C++ is compiling and running as expected. The errors are a bit confusing though. What changes did you do here? Judging by the errors it's like the contents of ranlux_unified_1.0.0.cl is being included at the end of the code instead of the beginning.

                          So just to clarify so we're on the same page, you should either leave all the codes as they are, or you can try to past the contents of ranlux_unified_1.0.0.cl into the top of the PRNGTest_kernels_1.9.cl file (so replace the first line, which is "#include "ranlux_unified_1.0.0.cl"" with the contents of the ranlux_unified_1.0.0.cl file).

                    • The RANLUX pseudorandom number generator implemented in OpenCL
                      MicahVillmow
                      joohongyee,
                      We do not support the apple platform. Please contact apple with any issues with their OpenCL implementation.
                      • The RANLUX pseudorandom number generator implemented in OpenCL
                        MicahVillmow
                        dravisher,
                        If you can provide a small test case that hits the issue, we can get it fixed for SDK 2.4.
                        • The RANLUX pseudorandom number generator implemented in OpenCL (updated)
                          spectral

                          Thanks for your open source project,

                           

                          But why using a RNG and not a QMRNG like halton sequence, van der corput ... ?

                          They have lower discrepency than any RNG ? And they are very fast and simple ! Right ?