cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dongzaixx
Journeyman III

4850 and 5670 in OpenCL

They have the same price. But I want to know which one shows better performance in OpenCL programs.

0 Likes
5 Replies
genaganna
Journeyman III

Originally posted by: dongzaixx They have the same price. But I want to know which one shows better performance in OpenCL programs.

 

Dongzaixx,

              HD4850 has 10 SIMDs means 800 shader processors

              HD5670 has 5 SIMD's means 400 shader processors.

             Most cases HD4850 performance better than HD5670 as HD4850 has double the shaders than HD5670.

             But HD5670 is designed to support OpenCL completely.

 

0 Likes

Originally posted by: genaganna
Originally posted by: dongzaixx They have the same price. But I want to know which one shows better performance in OpenCL programs.

 

 

 

 

Dongzaixx,

 

              HD4850 has 10 SIMDs means 800 shader processors

 

              HD5670 has 5 SIMD's means 400 shader processors.

 

             Most cases HD4850 performance better than HD5670 as HD4850 has double the shaders than HD5670.

 

             But HD5670 is designed to support OpenCL completely.

 

 

 

Since HD4XXX is not completely designed for OpenCL, that is why I ask this question. Do you have any benchmark?

0 Likes

Unless your OpenCL program uses local memory, 4850 will always give you better performance.

You can test the MatrixMultiplication sample in SDK 2.01, it runs without local memory on 7xx series and uses local memory on 8xx series. Use command line options = -x 256 -y 256- z 256

0 Likes

Originally posted by: n0thing Unless your OpenCL program uses local memory, 4850 will always give you better performance.

 

You can test the MatrixMultiplication sample in SDK 2.01, it runs without local memory on 7xx series and uses local memory on 8xx series. Use command line options = -x 256 -y 256- z 256

 

Just out of curiosity I've tried this test on HD5870 and on CPU (dual core Intel)  to see how much I would benefit from using GPU versus CPU, and I see no gain at all. Here are the commands and their output:

 

 

% MatrixMultiplication -device cpu -x 256 -y 256 -z 256 -i 16 

 

 

Executing kernel for 16 iterations

-------------------------------------------

KernelTime (ms) : 0.535729

GFlops achieved : 62.6332

KernelTime (ms) : 0.532304

GFlops achieved : 63.0362

KernelTime (ms) : 0.816905

GFlops achieved : 41.0751

KernelTime (ms) : 0.831021

GFlops achieved : 40.3774

 

..
% MatrixMultiplication -device gpu -x 256 -y 256 -z 256 -i 16
Executing kernel for 16 iterations
-------------------------------------------
KernelTime (ms) : 0.53732
GFlops achieved : 62.4478
KernelTime (ms) : 0.529792
GFlops achieved : 63.3351
KernelTime (ms) : 1.85189
GFlops achieved : 18.119
KernelTime (ms) : 0.831176
GFlops achieved : 40.3698
..
Is this what I should expect?

 

 

 

0 Likes

Originally posted by: gapon
Originally posted by: n0thing Unless your OpenCL program uses local memory, 4850 will always give you better performance.

 

 You can test the MatrixMultiplication sample in SDK 2.01, it runs without local memory on 7xx series and uses local memory on 8xx series. Use command line options = -x 256 -y 256- z 256

 

 



 

Just out of curiosity I've tried this test on HD5870 and on CPU (dual core Intel)  to see how much I would benefit from using GPU versus CPU, and I see no gain at all. Here are the commands and their output:

 

 

 

 

% MatrixMultiplication -device cpu -x 256 -y 256 -z 256 -i 16 

 

 Executing kernel for 16 iterations

 

-------------------------------------------

 

KernelTime (ms) : 0.535729

 

GFlops achieved : 62.6332

 

KernelTime (ms) : 0.532304

 

GFlops achieved : 63.0362

 

KernelTime (ms) : 0.816905

 

GFlops achieved : 41.0751

 

KernelTime (ms) : 0.831021

 

GFlops achieved : 40.3774

 

 ..

% MatrixMultiplication -device gpu -x 256 -y 256 -z 256 -i 16
Executing kernel for 16 iterations
-------------------------------------------
KernelTime (ms) : 0.53732
GFlops achieved : 62.4478
KernelTime (ms) : 0.529792
GFlops achieved : 63.3351
KernelTime (ms) : 1.85189
GFlops achieved : 18.119
KernelTime (ms) : 0.831176
GFlops achieved : 40.3698
..
Is this what I should expect? 

 

Gapon,

          Please run with -x 2048 -y 2048 -z 2048

          If you run for smaller dimensions,  transfer time dominates the kernel time that is why you see such poor performance.

0 Likes