cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lpw
Journeyman III

Questions about Radeon HD3870 X2

I have several questions regarding running CAL on a Radeon HD3870 X2 with 1GB RAM.  I'm using CAL SDK 1.0.2beta under XP 32.

The first thing I did was write a simple program that queries the available hardware.  Turns out that each X2 core is detected as a separate CAL device.  As far as I understand it, the two cores are automatically linked via Crossfire, but I am unclear on how Crossfire affects kernel execution in this configuration.

(1) Do I need to call calDeviceOpen() on both devices to take advantage of both cores or will interacting with device 0 be enough?

(2) On a related note, is it possible to run different kernels on each core?  (I'm guessing "no" on this one due to Crossfire)


I querried the devices using calDeviceGetAttribs(), calDeviceGetStatus(), and calDeviceGetInfo().  The following is the output of my program (source code available upon request):

Cal version 1.0.0
2 devices detected
Device 0:
  Target:                          670 (Radeon HD 3870)
  Local RAM:                     192 MB (175 MB available)
  Uncached Remote RAM: 256 MB (242 MB available)
  Cached Remote RAM:     28 MB   (27 MB available)
  Engine Clock:                 169 MHz
  Memory Clock:               245 MHz
Device 1:
  Target:                          670 (Radeon HD 3870)
  Local RAM:                     192 MB (175 MB available)
  Uncached Remote RAM: 256 MB (242 MB available)
  Cached Remote RAM:    28 MB (27 MB available)
  Engine Clock:                169 MHz
  Memory Clock:              245 MHz

I would appreciate some clarification on these values.

(3) The CALdeviceattribs and CALdevicestatus structures specify three types of RAM: Local, Uncached Remote, and Cached Remote.  Could someone elaborate on this distinction?  According to the CAL Programming Manual, the term "Remote Memory" refers to system memory, i.e., not the memory on the graphics board.  Surely there is a different meaning here?

(4) There are 1024 MB of RAM on my board.  72 MB seem to be missing in action (36 MB per core).  Where did they go?  Note: I'm refering to the numbers obtained from CALdeviceattribs, not CALdevicestatus.  1024 - 2 * (192 + 256 + 28) = 76.

(5)  Finally, the clock speeds seem a little low.  I'm guessing that this is due to the ATI PowerPlay technology, but again I would appreciate someone clearing this up.  Will the clock speeds increase automatically under load?  Is there a way to manually control the clock speeds?  Catalyst Control Center lists the clock speeds as 825 MHz (engine) and 900 MHz (memory), which sounds right.

Ah, that is all, thank you for reading.

Confused but cautiously optimistic,
Lukasz

0 Likes
2 Replies
cuorematto
Journeyman III

Hi lpw.
I am not a AMD guy, i am a PC user like you; but for the question number (1)
i think you need to open each device you want; you need to call calDeviceOpen() on both devices to take advantage of both DPP=Data Parallel Processors
CALdevice device = 0;
if(calDeviceOpen(&device, 0) != CAL_RESULT_OK) ERROR_OCCURRED();
/// If you want to open the device number 0 ////
CALdevice device = 1;
if(calDeviceOpen(&device, 1) != CAL_RESULT_OK) ERROR_OCCURRED();
/// or try////
CALdevice device = 1;
if(calDeviceOpen(&device, 1) != CAL_RESULT_OK) ERROR_OCCURRED();
/// If you want to open the device number 1 ///
///Or try if work this///
CALdevice device = 0;
CALdevice device = 1;
if(calDeviceOpen(&device, 1) != CAL_RESULT_OK) ERROR_OCCURRED();
if(calDeviceOpen(&device, 1) != CAL_RESULT_OK) ERROR_OCCURRED();
/// function call Kernell GPU jobs ///


For the question number (2) i think yes you can make a multiply for DPP number 0
and a adding for the DPP number 1 or use togheter for make the same calculation.

For the question (3) A CAL application will initialize input data in system memory. In many cases the data will need to be processed by the CPU before being sent to the GPU for further processing. This processing will require the CPU
to read from and write to the system memory. In these cases, it might be more efficient to request CAL to allocate this remote memory=system memory from cached system memory to allow faster processing of data from the CPU

For the question number (4)
Surely the clock speeds will increase under load.
If you want the GPU clock speed a little highter you can Flash the bios with a tool ( is free downloadable) and setting up the clock you want in the GPU bios; so you dont need to reinsert the clock speed each time you booting up the PC.
0 Likes

Hi Lukasz,

cuorematto got it right on #1 and #2. You need to interact with each GPU individually. Also, since you are talking to each GPU individually, then you can run a separate kernel on each.

Local memory is memory on the GPU, uncached remote is the size of memory you can create on the CPU side which is uncached (which means CPU interaction with it is uncached but GPU->CPU transfers will be faster), cached remote is the size of memory you can create on the CPU side which is cached.

As far as the memory size reported, can you try v1.1-beta and see if it reports things better?

And, for #5, the clock speeds are sampled at the beginning when the GPU is unloaded. The clock speeds are automatically set higher as needed.

Michael.
0 Likes