cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mohit2710
Journeyman III

Basic opencl queries...

Hi,

I was trying to understand the matrix multiplication example given in the sdk samples....

I have some doubts..

1.  My laptop has an intel core 2 duo processor....it consists of two cores(compute units)...is there a way I can handle the two cores...that is what if I want to use only one of the cores...not the second one...

 

2. The example takes into consideration several computation and setting up times....but they are not shown in the output....how can i get the output of those times....

Any kind of help is appreciated...

 

0 Likes
21 Replies
nou
Exemplar

you can set enviroment variable CPU_MAX_COMPUTE_UNITS=n

0 Likes
genaganna
Journeyman III

Originally posted by: mohit2710 Hi,

1.  My laptop has an intel core 2 duo processor....it consists of two cores(compute units)...is there a way I can handle the two cores...that is what if I want to use only one of the cores...not the second one...

Set enviroment variable CPU_MAX_COMPUTE_UNITS to number of cores you want use.

 2. The example takes into consideration several computation and setting up times....but they are not shown in the output....how can i get the output of those times....

Any kind of help is appreciated...

These is a document in doc folder for each sample. that describes all options used for perticular sample.

0 Likes

Hi,

I ran the matrix multiplication code for two 1024x1024 matrices and took two cases on my intel core 2 duo T6400 @ 2.00 GHz processor

In first case I set the no. of compute units =2  and the time came out to be 35.6 sec...

In the second case I set the no. of compute units = 1  and teh time came out to be 38 sec...

what does these results indicate....??

should the time taken not be double in the second case...??

0 Likes

Originally posted by: mohit2710 Hi,

 

I ran the matrix multiplication code for two 1024x1024 matrices and took two cases on my intel core 2 duo T6400 @ 2.00 GHz processor

 

In first case I set the no. of compute units =2  and the time came out to be 35.6 sec...

 

In the second case I set the no. of compute units = 1  and teh time came out to be 38 sec...

 

what does these results indicate....??

 

should the time taken not be double in the second case...??

 

Mohit2710,

      Please run for bigger matrices.

I am getting following on my Phenom Quad-core for 2048 X 2048

      1. CPU_MAX_COMPUTE_UNITS=1

              202.607 sec

       2. CPU_MAX_COMPUTE_UNITS=2

               109.014 sec

 

Kernel time includes ReadBuffer also. To measure exactly consider only kernel execution(clEnqueueNDRangeKernel).

 

Please close other applications before running this.

0 Likes

Hi,

I am using ubuntu 9.04 in VMWare software...

My host operating system is xp..

I have tried to change the CPU_MAX_COMPUTE_UNITS variable...but no change in result occurs...

If i type 'env' in the terminal, it doesn't show any such variable....

Anyways, i typed 'export CPU_MAX_COMPUTE_UNITS=2' or 1 to set the variable but no change in timng occurs...

Am i doing wrong..??

Can you tell me how to exactly set this environment variable...????

 

0 Likes

setting environment variable is right.  but you should get this variable when  env command used.  I am not sure why it is not showing in the list. 

 

Write simple C program with reads environemnt variable and print value.

0 Likes

If I type 'export CPU_MAX_COMPUTE_UNITS=2' followed by 'env', then it does show in the list..

But my problem here is that no matter how many compute units i select through the environment variable, the timing remains the same

The configuration of my computer which i am using is : Intel Xeon CPU E5405 @2.00 GHz, Quad core processor.

 

0 Likes

mohit2710,

               could you please install latest OpenCL SDK and run CLInfo sample?

CLInfo sample display the device information availlable on your system. It contains "Max compute units" field. Please let me know what is the  value for that field.

              You will find latest OpenCL SDK at http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.

 

             Not sure what is problem on your system.

 

 

 

0 Likes

On compiling ClInfo i get an error that gl.h is not found..

This file is to be provided by ATI in its SDK which is not

0 Likes

Originally posted by: mohit2710 On compiling ClInfo i get an error that gl.h is not found..

 

This file is to be provided by ATI in its SDK which is not

 

Please set ATISTREAMSDKSAMPLESROOT to your installed directory.

Please read ATI_Stream_SDK_Getting_Started_Guide_v2.0.pdf available at http://developer.amd.com/gpu/ATIStreamSDK/pages/Documentation.aspx

 

 

0 Likes

I was able to run the CLInfo file..It shows Max compute units equal to 1, but my processor is a quad core processor

0 Likes

Mohit2710,

                Not sure why OpenCL is getting one compute unit on your system.

could you please post more details about system information and VMWare?

 

0 Likes

 

This is the complete output of the CLInfo file....

I am using VMWare player 3.0

What do you suggest after seeing this output?

Number of platforms:                 1
  Plaform Profile:                 FULL_PROFILE
  Plaform Version:                 OpenCL 1.0 ATI-Stream-v2.0-beta4
  Plaform Name:                     ATI Stream
  Plaform Vendor:                 Advanced Micro Devices, Inc.


  Plaform Name:                     ATI Stream
Number of devices:                 1
  Device Type:                     CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Max compute units:                 1
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 1024
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Max clock frequency:                 1995Mhz
  Address bits:                     32
  Max memeory allocation:             536870912
  Image support:                 No
  Max size of kernel argument:             4096
  Alignment (bits) of base address:         1024
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 No
    Round to +ve and infinity:             No
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     65536
  Global memory size:                 1073741824
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 32768
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                 
    Execute OpenCL kernels:             Yes
    Execute native function:             No
  Queue properties:                 
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0
  Name:                         Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
  Vendor:                     GenuineIntel
  Driver version:                 1.0
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.0 ATI-Stream-v2.0-beta4
  Extensions:                     cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store

 

VMWare info:

cat /proc/cpuinfo
processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 23
model name    : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 10
cpu MHz        : 1995.040
cache size    : 6144 KB
fdiv_bug    : no
hlt_bug        : no
f00f_bug    : no
coma_bug    : no
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss nx constant_tsc up arch_perfmon pebs bts xtopology tsc_reliable pni ssse3 sse4_1 hypervisor
bogomips    : 3990.08
clflush size    : 64
power management:

0 Likes

well problem is vmware. i do not use vmware but in virtualbox a i can set number of cores which use a virtual machine. so i think that vmware create one cored virtual system.

0 Likes

Mohit2710,

              It seems you have created virtual system with one core.

0 Likes

Thanks for your help..It was indeed the problem of VMware.That problem has been solved now, but still the time is not in propotional to the number of cores used..

The results obtained are: input array of 2048x2048,2048x2048; blocksize of 32

Compute unit=1     time=267s

Compute unit=2     time=160s

 

Is this difference considerable enough?

Another doubt I hav is that how are the threads deployed to the hardware? Is it the same concept as that of pthreads or is there a difference here?

0 Likes

Mohit2710,

     Various factors effect the performance.  Increasing the Blocksize increases the performance. 

    if you have only two cores you won't see much difference.

 

    In my case, i have 4 core that is why i see almost twice performance b/w  one code and two cores.

0 Likes

Why is that so?That is if u select 2 cores on a 4 core system..isn't it equivalent to working on a 2 core machine?

I have another doubt..I used the 64-bit SDK on a 4 core system..While run- time I get an error that "clGetPlatformIDs failed"..I have not been able to figure out a raeson for this error..

0 Likes

Originally posted by: mohit2710 Why is that so?That is if u select 2 cores on a 4 core system..isn't it equivalent to working on a 2 core machine?

In mycase,  OS uses other to core for any application to run.  but your case, OS uses same cores.

 

I have another doubt..I used the 64-bit SDK on a 4 core system..While run- time I get an error that "clGetPlatformIDs failed"..I have not been able to figure out a raeson for this error..

 

Your code needs to be chagned as per the ICD model.  see more information on ICD code changes at http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=71

0 Likes

Hi,

I would like to know if there are any examples in the ati kit that employ task parallelism....

I ahve studied matrix multiplication which is essentially data parallelism...

 

0 Likes

task paralelism mean that you run kernel with work group size one. sou you must run different kernels but on implementation concurent kernel isnt supported.

0 Likes