cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jfkong
Journeyman III

exportspeed sample program from CAL failed verification

hi,

I use Linux X86-64, CAL 1.2.1beta-1 and latest driver on one 4870.

/amdcal/samples/runtime/exportspeed -r 1 -w 256 -h 256 -e  passed.

However exportspeed -r 1 -w 16 -h 16 -e failed.

Could someone please verify this?

I suspect there is something wrong with the global buffer as I met some other problem too. 

Below the simple kernel just copies one element from one input buffer(the global buffer ) to another output buffer.  However I failed to get correct results when using options  "-w 4 -h 16 -e".  Say third element (float4) of the input is 1.0, 0.0, 1.0, 1.0.  The result I got could be 0.0, 0.0, 0.0, 0.0 or other strange numbers such as 0.089. I also sometimes got the correct result. But the result keeps changing everytime I execute the program.

The input buffer is 2D float4 and output is 2D float4 (w/4 and h).  Execution domain is (0, 0, w/4, h). However  "-w 16 -h 16 -e" always gets me the correct result

const CALchar ILKernel[] =
"il_ps_2_0\n"
"dcl_cb cb0[1]\n"
"dcl_output_generic o0\n"
"dcl_literal l0, 0x00000002, 0x40490FDB, 0x40490FDB, 0x40490FDB\n"
"mov     r0, g[l0.x]\n"
"mov o0, r0\n"
"end\n";

 

Another example is memimport_matmul

if you try -r 1 -w 16 -h 16 -e, the verification is good.

However if you try  -r 1 -w 16 -h 8 -e, the verification fails.

0 Likes
3 Replies

JFKong, please see the CAL release notes. Inside it states that global buffer has a minimum size of 64x64 and needs to be a multiple of 64 in width.
0 Likes
jfkong
Journeyman III

Thanks a lot for the info. But you might want to remove this constraint as sometimes for debug/test purpose, we'd like to try few elements at the beginning. It really took me a while to figure out the weird behavior.

 

Also I still can't find the exact info about the minimum size of 64X64 in the CAL release notes. And I think this constraint could be noted precisely in the IL document (along with global buffer description).

In amd-cal-readme-win

"Local Memory Global Buffers
---------------------------
Local memory global buffers must be a multiple of 64 elements wide. If they are not a multiple of 64 wide, the allocation will fail, and the code
returns an error."

In amd-cal-readme-linux32

"NOTE ON LOCAL MEMORY GLOBAL BUFFERS:   
   
    Local memory global buffers with sizes < 64 produces incorrect
    results when the memory is directly mapped using calResMap. If
    the data is copied via calMemCopy to a remote buffer before
    calling calResMap on the remote buffer, the data is correct.
    This does not affect directly writing to remote memory."

In amd-cal-readme-linux64

"NOTE ON LOCAL MEMORY GLOBAL BUFFERS:  
   
    Local memory global buffers with sizes < 64 produces incorrect
    results when the memory is directly mapped using calResMap. If
    the data is copied via calMemCopy to a remote buffer before
    calling calResMap on the remote buffer, the data is correct.
    This does not affect directly writing to remote memory."

0 Likes

hi, Micah Villmow

 

I just got SDK1.3, new 8.12 driver and according to the documentation (page 3-36)

"Note: Global (Linear) buffers are always padded to a 64-element boundary; however, the memexport instruction is not constrained by this,
and the program can write into the pad area. During mapping, when
copying from local to remote storage, data written to the pad area
is not copied (it is lost).


The hardware output paths are different when a buffer is attached
as an export buffer rather than an output buffer.


Ensure that the global buffer has a width that is a multiple of 64
elements.


When entering a width that is not multiple of 64 and using the global
buffer, calResAllocLocal2D returns a warning. Users also can
query the error message for this warning"

 

Ok, then I tried exportspeed again.

./exportspeed -r 1 -w 16 -h 1 -e: error message (something about the width should be multiple of 64).

./exportspeed -r 1 -w 64 -h 1 -e: passed

So far so good, however

if I do two different runs in a row:

../inputspeed/inputspeed  (another sample program in CAL)

./exportspeed -r 1 -w 64 -h 1 -e: FAILED !!!

This is no good and I am not sure if it is a bug or just that 64X64 is strictly required? (probably not 64X64 since 64 2 worked)

Thanks

0 Likes