7 Replies Latest reply on Mar 26, 2015 6:30 AM by dns.on.gpu

    sys freeze for large(?) array

    dns.on.gpu

      Hello.

       

      I am trying various tests using ffts on a 3gb hd 7950. The speeds I seem to get are impressive - if the use

      of gettimeofday() can be relied upon to measure elapsed times.

       

      When I try to make  100000 x 1536 double precision in-place r2c transforms, my pc seizes up.

      The half case (50000 x 1536) is fine.  I am running this of a 8gb pc with ubuntu 14.04 and using

      the gui with 4 windows open. Before I post any code etc,  is there anything I should be checking

      first?

       

      thanks.

      --

        • Re: sys freeze for large(?) array
          ravkum

          Are you using clFFT library or is it your own fft kernel?

            • Re: sys freeze for large(?) array
              dns.on.gpu

              I use the standard clFFT library from github. I am not certain it fails at the clFFTS. I cannot

              determine where it fails. As i said, smaller problems are fine.

               

              btw, i have now done some tests by incorporating the clFFTs in my code and the

              actual speeds of execution of the clffts themselves in double precision seem fantastic.

              however, it takes 30 times as long to send and get the data back on pcie-2.

              --

                • Re: sys freeze for large(?) array
                  ravkum

                  You could try putting clFinish calls after every non-blocking OpenCL (or clFFT) calls and then use print statements to narrow down where the project crashes.

                   

                  Have you used CodeXL to measure the clFFT kernel time? The memory transfer does take a long time. Are you using clEnqueueMap/Unmap calls or clEnqueueWriteBuffer calls? Again is it the the CodeXL time for data transfer or gettimeofday in the host code?

                    • Re: sys freeze for large(?) array
                      dns.on.gpu

                      I have now bought and installed an XFX 280X. A disk reformatting

                      and OS, driver installation was done.  The freeze problem persists.

                       

                      The simple program is the modified example of using clfft @ github.

                      It is given below.

                       

                      The last statement to print before all freezes is the print preceding clfftSetPlanDistance.

                       

                      While the system is frozen, its power consumption is ~100W above idle.

                       

                      I use the gettimeofday on the host to time the elapsed time of the various calls.

                       

                      I also use  clEnqueueWriteBuffer  / clEnqueueReadBuffer for transferring data -

                      as it is evident in the code.

                       

                      Speeding up the  host-gpu data transfers of these interesting gpus will make them

                      more usable for scientific computations.

                       

                      For the new 280X, the ratio of elapsed times of 2-way data transfer to the fft kernel

                      call is 38.

                       

                      //============================================================

                       

                      #include <stdlib.h>

                      #include <stdio.h>

                      #include <time.h>

                      #include <sys/time.h>

                      #include <math.h>

                       

                      //  ======================   R2C Version DOUBLE PRECISION =========================

                       

                      #include <clFFT.h>

                       

                      int main( void )

                      {

                          int i, j;

                       

                          struct timeval start, end, t1, t2, t3, t4, t5;

                          double tm1, tm2, tm3, tm4, tm5;

                       

                          cl_int err;

                          cl_platform_id platform = 0;

                          cl_device_id device = 0;

                          cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, 0, 0 };

                          cl_context ctx = 0;

                          cl_command_queue queue = 0;

                          cl_mem bufX;

                          double *X, *SAVE;

                          cl_event event = NULL;

                          int ret = 0;

                          size_t N = 1536;

                          size_t NBATCH = 100000;

                       

                          double pi = 4.0*atan2(1.0,1.0);

                       

                          printf("\n -- Start \n");

                          printf("    Transform Size = %zu \n", N );

                          printf("    Transform Numb = %zu \n", NBATCH );

                       

                          clfftPlanHandle planHandle;

                          clfftDim dim = CLFFT_1D;

                          size_t clLengths[1] = {N};

                       

                          err = clGetPlatformIDs( 1, &platform, NULL );

                          err = clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL );

                       

                          props[1] = (cl_context_properties)platform;

                          ctx = clCreateContext( props, 1, &device, NULL, NULL, &err );

                          queue = clCreateCommandQueue( ctx, device, 0, &err );

                       

                          clfftSetupData fftSetup;

                          err = clfftInitSetupData(&fftSetup);

                          err = clfftSetup(&fftSetup);

                       

                          X    = (double *)malloc( NBATCH*(N+2) * sizeof(*X));

                          SAVE = (double *)malloc( NBATCH*(N+2) * sizeof(*SAVE));

                       

                          for(j=0; j<NBATCH; j++ ){

                          for(i=0; i<N+2; i++ ){

                                 X[j*(N+2) + i] = SAVE[j*(N+2) + i] = sin(2.0*pi*(double)(i)/(double)(N)) ;

                          }

                          }

                       

                       

                       

                          printf("\n -- clCreateBuffer ");

                          bufX = clCreateBuffer( ctx, CL_MEM_READ_WRITE, NBATCH*(N+2)*sizeof(*X), NULL, &err );

                       

                       

                          printf("\n -- clfftCreateDefaultPlan ");

                          err = clfftCreateDefaultPlan(&planHandle, ctx, dim, clLengths);

                       

                       

                          printf("\n -- clfftSetPlanPrecision ");

                          err = clfftSetPlanPrecision(planHandle, CLFFT_DOUBLE);

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR clfftSetPlanPrecision \n");exit; }

                          err = clFinish(queue);

                       

                       

                          printf("\n -- clfftSetLayout ");

                          err = clfftSetLayout(planHandle, CLFFT_REAL, CLFFT_HERMITIAN_INTERLEAVED);

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR clfftSetLayout \n");exit; }

                          err = clFinish(queue);

                       

                       

                          printf("\n -- clfftSetResultLocation ");

                          err = clfftSetResultLocation(planHandle, CLFFT_INPLACE);

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR clfftSetResultLocation \n");exit; }

                          err = clFinish(queue);

                       

                       

                          printf("\n -- clfftSetPlanBatchSize ");

                          err = clfftSetPlanBatchSize(planHandle, NBATCH );

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR Setting Batch \n");exit; }

                          err = clFinish(queue);

                       

                      //==============================================================================

                      //==============================================================================

                       

                         printf("\n -- clfftSetPlanDistance ");  // ************   LAST STATEMENT TO PRINT

                          err = clfftSetPlanDistance(planHandle, N+2, (N+2)/2  );

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR Setting Batch Distance\n");exit; }

                           err = clFinish(queue);

                       

                      //==============================================================================

                      //==============================================================================

                       

                       

                          printf("\n -- clfftBakePlan ");

                          err = clfftBakePlan(planHandle, 1, &queue, NULL, NULL);

                          if( err != CL_SUCCESS ){ printf("\n *** ERROR clfftBakePlan \n");exit; }

                          err = clFinish(queue);

                       

                       

                       

                         gettimeofday(&start, NULL);

                       

                          printf("\n -- clEnqueueWriteBuffer ");

                         err = clEnqueueWriteBuffer( queue, bufX, CL_TRUE, 0, NBATCH*(N+2)*sizeof( *X ), X, 0, NULL, NULL );

                        if( err != CL_SUCCESS ){ printf("\n *** ERROR clEnqueueWriteBuffer \n");exit; }

                          err = clFinish(queue);

                       

                         gettimeofday(&t1, NULL);

                       

                         printf("\n -- clfftEnqueueTransform ");

                         err = clfftEnqueueTransform(planHandle, CLFFT_FORWARD, 1, &queue, 0, NULL, NULL, &bufX, NULL, NULL);

                         if( err != CL_SUCCESS ){ printf("\n *** ERROR clfftEnqueueTransform \n");exit; }

                         err = clFinish(queue);

                       

                         gettimeofday(&t2, NULL);

                       

                       

                         printf("\n -- clEnqueueReadBuffer ");

                         err = clEnqueueReadBuffer( queue, bufX, CL_TRUE, 0, NBATCH*(N+2)*sizeof( *X ), X, 0, NULL, NULL );

                         if( err != CL_SUCCESS ){ printf("\n *** ERROR clEnqueueReadBuffer \n");exit; }

                         err = clFinish(queue);

                       

                       

                         gettimeofday(&end, NULL);

                       

                         clReleaseMemObject( bufX );

                       

                      /*

                          for(j=0; j<NBATCH; j++ ){

                          printf("\n ");

                          for(i=0; i<N+2; i++ ){

                               printf(" i, OUT =  %d  %e  %e \n", i, SAVE[j*(N+2) + i], X[j*(N+2) + i]);

                          }

                          }

                      //*/

                       

                       

                          free(X);

                          err = clfftDestroyPlan( &planHandle );

                          clfftTeardown( );

                          clReleaseCommandQueue( queue );

                          clReleaseContext( ctx );

                       

                           tm1 = (double)( end.tv_sec  - start.tv_sec );

                           tm2 = (double)( end.tv_usec - start.tv_usec ) / 1000000.0;

                           tm3 = (double)( t2.tv_usec - t1.tv_usec ) / 1000000.0;

                           tm4 = (tm2 - tm3 ) /tm3;

                       

                           printf("\n -- Done.  Times = %f  %f  %f  %f  \n\n",  tm1, tm2, tm3, tm4 );

                       

                          return;

                      }