0 Replies Latest reply on Aug 18, 2010 5:57 AM by Fuxianjun

    OpenCL in neuronetwork


      I am programming Neuronetwork algorithm with OpenCL and encounting some problem.

      Part of my algorithm is like this:
      For example the neuronetwork is x-y-z, it means there are a input vector with length of x, a middle  vector with length of y and a output  vector with length of z, also there are two matrixes of which factors' values are specified.
      The first matrix's dimension is y*x and the second is z*y . Since then, the  algorithm is: step1 , middle-vector=first-matrix * inputvector; step2, output-vector=second-matrix * middle-vector. Surely ,there are bias and activation functions in Neuronetwork, but for the sake of predigesting,  they are ignorable.
      In OpenCL programming , I can seperate this two steps  in two kernels then global_work_sizes are y and z respectively. However, what I use is OpenCL.NET, if the algorithm is in two kernels , consumed time would get longer.
      So, I can only implement the algorithm in one kernel.
      My problem are:
      1. How to specify global_work_size ? If global_work_size=max(y,z) , there are |y-z| workitems will be wasted in one step, will this work well ?
      2. Before calculate step2, all factors of middle-vector must be figured out. So it need a synchronization function here. However , you told me there is no  global synchronization so far in GPGPU programming, so I can only use one work-group to work. But CL_DEVICE_MAX_WORK_GROUP_SIZE of my GPU is 256,  workitems can be used in my algorithm are limited. Is my analysis correct ?
      3. For multiply matrix by vector, I counted out that GPU's operation speed is just several multiple of CPU's ,whatever how big the matrix is. Am I right ?