This function doesn't seem to be working correctly.
I have all the cal headers included and the compiler doesn't have a problem recognizes CALcounter, which is in the same header as calCtxCreateCounter.
It seems to be there but I can't get it to compile. Any ideas?
Found this thread:
Seems that in order to use this function you need to rewrite the perfcounters example. This seems to be a bit redundant, you should be able to just call the function if you include the header.
Also, the documentation makes it seem as though you can just call this function if you include the right header (which the docs don't specify which one that would be either).
The docs seem to be bloated but don't include enough information. IMO, a developer shouldn't have to comb the samples for hours on end just to find out how to call a function (which is a really basic thing).
Am I off on this one?
So is there some way to just call the function, as you would any other function without having to rewrite the perfcounters example?
So was anyone else able to use the calCtxCreateCounter function without re-writing the perfcounter example?
call the following counter_func_init() to get pointers to those functions.
========================================
#include "cal_ext.h"
#include "cal_ext_counter.h"
static PFNCALCTXCREATECOUNTER calCtxCreateCounter;
static PFNCALCTXDESTROYCOUNTER calCtxDestroyCounter;
static PFNCALCTXBEGINCOUNTER calCtxBeginCounter;
static PFNCALCTXENDCOUNTER calCtxEndCounter;
static PFNCALCTXGETCOUNTER calCtxGetCounter;
int counter_func_init()
{
//////////////////////////////////////////////////////////////////////////
//
// Get extension functions
//
if (calExtSupported((CALextid)CAL_EXT_COUNTERS) != CAL_RESULT_OK)
{
fprintf(stderr, "No extention support!\n");
return 1;
}
calExtGetProc((CALextproc*)&calCtxCreateCounter, (CALextid)CAL_EXT_COUNTERS, "calCtxCreateCounter");
calExtGetProc((CALextproc*)&calCtxDestroyCounter, (CALextid)CAL_EXT_COUNTERS, "calCtxDestroyCounter");
calExtGetProc((CALextproc*)&calCtxBeginCounter, (CALextid)CAL_EXT_COUNTERS, "calCtxBeginCounter");
calExtGetProc((CALextproc*)&calCtxEndCounter, (CALextid)CAL_EXT_COUNTERS, "calCtxEndCounter");
calExtGetProc((CALextproc*)&calCtxGetCounter, (CALextid)CAL_EXT_COUNTERS, "calCtxGetCounter");
return 0;
}
Yes, like I said I can re-type the perfcounter example, that's not a problem. I just wanted to know if there was a better (easier) way to do it without having to rewrite code that has already been written.
I guess my answer is no!?
Also, the program compiles just fine with the added code from perfcounters (and above).
BUT it crashes now on the calCtxCreateCounter call with an Access Violation error.
Here is my call:
CALcounter cacheCounter;
if
(calCtxCreateCounter(&cacheCounter,ctx,CAL_COUNTER_INPUT_CACHE_HIT_RATE) != CAL_RESULT_OK)
fprintf(stdout,
"error creating counter\n");
Any ideas?
BTW, I'm running Xp32, 1.4SDK, 9.3 Catalyst.
So, I take it no one else has had this problem?
Whole code here:
static
PFNCALCTXCREATECOUNTER calCtxCreateCounterExt;
static
PFNCALCTXDESTROYCOUNTER calCtxDestroyCounterExt;
static
PFNCALCTXBEGINCOUNTER calCtxBeginCounterExt;
static
PFNCALCTXENDCOUNTER calCtxEndCounterExt;
static
PFNCALCTXGETCOUNTER calCtxGetCounterExt;
int
counter_func_init()
{
//////////////////////////////////////////////////////////////////////////
//
// Get extension functions
//
if (calExtSupported((CALextid)CAL_EXT_COUNTERS) != CAL_RESULT_OK)
{
fprintf(stderr,
"No extention support!\n");
return 1;
}
calExtGetProc((CALextproc*)&calCtxCreateCounterExt, (CALextid)CAL_EXT_COUNTERS,
"calCtxCreateCounter");
calExtGetProc((CALextproc*)&calCtxDestroyCounterExt, (CALextid)CAL_EXT_COUNTERS,
"calCtxDestroyCounter");
calExtGetProc((CALextproc*)&calCtxBeginCounterExt, (CALextid)CAL_EXT_COUNTERS,
"calCtxBeginCounter");
calExtGetProc((CALextproc*)&calCtxEndCounterExt, (CALextid)CAL_EXT_COUNTERS,
"calCtxEndCounter");
calExtGetProc((CALextproc*)&calCtxGetCounterExt, (CALextid)CAL_EXT_COUNTERS,
"calCtxGetCounter");
return 0;
}
And then I call it:
CALcounter cacheCounter;
if
(calCtxCreateCounterExt(&cacheCounter, ctx, CAL_COUNTER_INPUT_CACHE_HIT_RATE) != CAL_RESULT_OK)
exit(1);
I would seriuosly love to get this working.
Ryta, are the correct header files included?
Yes. I have all the same header files included as the perfcounters example, which runs fine, no problems.
I'm pretty sure it wouldn't compile otherwise. I am getting a runtime crash, an access violation error.
Originally posted by: ryta1203 Yes. I have all the same header files included as the perfcounters example, which runs fine, no problems.
"runs fine, no problems"? Do you mean you compiled it fine?
I'm pretty sure it wouldn't compile otherwise. I am getting a runtime crash, an access violation error.
Can you post your error information here? If you can provide the whole code, that would be very helpful!
1. No, I mean it runs fine. THe perfcounter example runs fine: meaning it compiles and runs, no crash, no errors.
2. I have posted the error information and code above.
The only other code is the #include.. all the same header files are included. I am using namespace std.
Is there any other information that you can think of that you need?
I have been unable to solve this problem, the runtime error is still occuring so it seems to me that AMD has a problem with these function calls. Is that accurate?
We really can't debug this without the whole application or a whole test case that shows this problem. Also what card is this being tested on?
We have a sample that uses these function calls and I know of other apps that use them, so i'm pretty sure they do work.
I have tried this on both the 4870 and the 4850. Like I said the sample WORKS, but I can't get it to go on this code, I guess I'm probably missing something silly. Here is my code:
static
PFNCALCTXCREATECOUNTER calCtxCreateCounter;
static
PFNCALCTXDESTROYCOUNTER calCtxDestroyCounter;
static
PFNCALCTXBEGINCOUNTER calCtxBeginCounter;
static
PFNCALCTXENDCOUNTER calCtxEndCounter;
static
PFNCALCTXGETCOUNTER calCtxGetCounter;
int
counter_func_init()
{
if (calExtSupported((CALextid)CAL_EXT_COUNTERS) != CAL_RESULT_OK)
{
fprintf(stderr,
"No extention support!\n");
return 1;
}
calExtGetProc((CALextproc*)&calCtxCreateCounter, (CALextid)CAL_EXT_COUNTERS,
"calCtxCreateCounter");
calExtGetProc((CALextproc*)&calCtxDestroyCounter, (CALextid)CAL_EXT_COUNTERS,
"calCtxDestroyCounter");
calExtGetProc((CALextproc*)&calCtxBeginCounter, (CALextid)CAL_EXT_COUNTERS,
"calCtxBeginCounter");
calExtGetProc((CALextproc*)&calCtxEndCounter, (CALextid)CAL_EXT_COUNTERS,
"calCtxEndCounter");
calExtGetProc((CALextproc*)&calCtxGetCounter, (CALextid)CAL_EXT_COUNTERS,
"calCtxGetCounter");
return 0;
}
void
calCallIL(int cnum_inputs, int cnum_outputs, double alu_fetch)
{
string ILKernel;
clock_t start, stop;
double duration=0.0f;
int kernel_loop = 100;
CALfloat counter_result=0.0f;
ILKernel = codeGenIL(cnum_inputs, cnum_outputs, alu_fetch);
CALuint cal_size = c_size;
unsigned int size = c_size;
int num_inputs=cnum_inputs;
int num_outputs=cnum_outputs;
int i=0;
if(calInit() != CAL_RESULT_OK)
fprintf(stderr, "error occured");
CALuint version[3];
calGetVersion(&version[0], &version[1], &version[2]);
calclGetVersion(&version[0], &version[1], &version[2]);
CALuint numDevices = 0;
if(calDeviceGetCount(&numDevices) != CAL_RESULT_OK)
fprintf(stderr, "error occured");
CALdeviceinfo info;
if(calDeviceGetInfo(&info, 0) != CAL_RESULT_OK)
fprintf(stderr, "error occured");
switch(info.target)
{
case
CAL_TARGET_600:
{
fprintf(stdout,
"Device Type = GPU R600\n");
break;
}
case
CAL_TARGET_670:
{
fprintf(stdout,
"Device Type = GPU RV670\n");
break;
}
case
CAL_TARGET_770:
{
fprintf(stdout,
"Device Type = GPU RV770\n");
break;
}
default:
{
fprintf(stdout,
"Unknown Device\n");
}
}
CALdevice device = 0;
if(calDeviceOpen(&device, 0) != CAL_RESULT_OK)
fprintf(stderr, "error occured");
CALcontext ctx=0;
if(calCtxCreate(&ctx, device) != CAL_RESULT_OK) fprintf(stderr, "error occured");
CALcounter cacheCounter=0;
calCtxCreateCounter(&cacheCounter, ctx,CAL_COUNTER_INPUT_CACHE_HIT_RATE);
CALresource inLocal[MAX_INPUTS], outLocal[MAX_OUTPUTS];
for (i=0;i<num_inputs;i++)
{
inLocal=0;
if(calResAllocLocal2D(&inLocal, device, cal_size, cal_size, CAL_FORMAT_FLOAT_1, 0) != CAL_RESULT_OK)
fprintf(stderr,
"error occured allocating resource inLocal %d", i);
}
for (i=0;i<num_outputs;i++)
{
if(calResAllocLocal2D(&outLocal, device, cal_size, cal_size, CAL_FORMAT_FLOAT_1, 0) != CAL_RESULT_OK)
fprintf(stderr,
"error occured allocating resource outLocal %d", i);
}
CALfloat *inPtr[MAX_INPUTS];
CALfloat *outPtr[MAX_OUTPUTS];
CALuint pitch = 0;
for (i=0;i<num_inputs;i++)
{
inPtr = NULL;
if (calResMap((CALvoid**)&inPtr, &pitch, inLocal, 0) != CAL_RESULT_OK)
fprintf(stderr, "error occured mapping resource inPtr %d", i);
}
CALfloat *tmp[MAX_INPUTS];
for (i=0;i<num_inputs;i++)
{
for (unsigned int k=0;k < size; k++)
{
tmp = &inPtr[k*pitch];
for (unsigned int j=0;j<size;j++)
{
tmp
float)(j+1))*((float)(k+1)))/10000.0f;
}
}
}
//unmap the resource for input
for (i=0;i<num_inputs;i++)
{
if (calResUnmap(inLocal) != CAL_RESULT_OK)
fprintf(stderr, "error occured unmapping resource inLocal %d\n",i);
}
CALmem inmem[MAX_INPUTS], outmem[MAX_OUTPUTS];
for (i=0;i<num_inputs;i++)
{
inmem=0;
if (calCtxGetMem(&inmem, ctx, inLocal) != CAL_RESULT_OK)
fprintf(stderr, "error binding resource %d to context\n", i);
}
for (i=0;i<num_outputs;i++)
{
outmem=0;
if (calCtxGetMem(&outmem, ctx, outLocal) != CAL_RESULT_OK)
fprintf(stderr, "error binding out resource %d to context\n",i);
}
CALobject obj=NULL;
CALimage img=NULL;
if(calclCompile(&obj, CAL_LANGUAGE_IL, ILKernel.c_str(), info.target) != CAL_RESULT_OK)
{
fprintf(stderr,
"Error compiling, string is %s\n", calclGetErrorString());
exit(1);
}
if(calclLink(&img, &obj, 1) != CAL_RESULT_OK)fprintf(stderr, "error linking object\n");
CALmodule module=0;
if(calModuleLoad(&module, ctx, img) != CAL_RESULT_OK) fprintf(stdout, "error loading module\n");
CALfunc entry = 0;
if(calModuleGetEntry(&entry, ctx, module, "main") != CAL_RESULT_OK) fprintf(stdout, "error getting module entry point\n");
CALname inName[MAX_INPUTS], outName[MAX_OUTPUTS];
CALchar paramName[10];
for (i=0;i<num_inputs;i++)
{
sprintf_s(paramName,
"i%d", i);
inName = 0;
if(calModuleGetName(&inName, ctx, module, paramName ) != CAL_RESULT_OK)fprintf(stdout,"error getting module name %s\n", paramName);
}
for (i=0;i<num_outputs;i++)
{
sprintf_s(paramName,
"o%d", i);
outName=0;
if(calModuleGetName(&outName, ctx, module, paramName) != CAL_RESULT_OK)fprintf(stdout,"error getting module name %s\n", paramName);
}
for (i=0;i<num_inputs;i++)
{
if(calCtxSetMem(ctx, inName, inmem) != CAL_RESULT_OK)fprintf(stdout, "error setting context memory %s\n", paramName);
}
for(i=0;i<num_outputs;i++)
{
if(calCtxSetMem(ctx, outName, outmem) != CAL_RESULT_OK)fprintf(stdout, "error setting context memory %s\n", paramName);
}
CALdomain domain = {0, 0, size, size};
CALevent event1 = 0;
CALresult calCtxError;
int timing_loop;
double correction=0.0f;
double first_timing=0.0f;
for(timing_loop=0;timing_loop<2;timing_loop++)
{
duration=0.0f;
for(i=0;i<kernel_loop;i++)
{
calCtxFlush(ctx);
start=clock();
calCtxError = calCtxRunProgram(&event1, ctx, entry, &domain);
if(calCtxError == CAL_RESULT_BAD_HANDLE)fprintf(stdout, "bad handle error running program\n");
if (calCtxError == CAL_RESULT_ERROR)fprintf(stdout, "symbol error running context program\n");
while(calCtxIsEventDone(ctx, event1) == CAL_RESULT_PENDING);
stop=clock();
duration+=(stop-start);
}
duration = duration/(
double)CLOCKS_PER_SEC;
printf (
"%.10lf\n", duration );
fdata<<kernel_loop<<
"\t"<<setiosflags(ios::fixed)<<setprecision(10)<<duration<<"\t\t";
printf(
"%f\n", 1.0f/duration);
kernel_loop = (
int)((1.0f/duration)*(double)kernel_loop);printf("New Kernel Loop: %d\n", kernel_loop);
if(timing_loop==0)
{
first_timing=duration;
}
else
{
correction=first_timing*duration;
fdata<<correction;
}
}
fdata<<endl;
//remap the resource for output
for (i=0;i<num_outputs;i++)
{
outPtr = NULL;
if (calResMap((CALvoid**)&outPtr, &pitch, outLocal, 0) != CAL_RESULT_OK)fprintf(stderr, "error occured mapping resource outLocal %d", i);
}
CALfloat *out1[MAX_OUTPUTS];
for(i=0;i<num_outputs;i++)
{
for (unsigned int k=0;k < size; k++)
{
out1 = &outPtr[k*pitch];
for (unsigned int j=0;j<size;j++)
{
}
}
}
for(i=0;i<num_outputs;i++)
{
if (calResUnmap(outLocal) != CAL_RESULT_OK)fprintf(stderr, "error occured unmapping outLocal %d", i);
}
calModuleUnload(ctx, module);
calclFreeImage(img);
calclFreeObject(obj);
for (i=0;i<num_inputs;i++)
{
if (calCtxReleaseMem(ctx, inmem) != CAL_RESULT_OK)fprintf(stderr, "error occured releasing resource inmem %d from context", i);
}
for (i=0;i<num_outputs;i++)
{
if (calCtxReleaseMem(ctx, outmem) != CAL_RESULT_OK)fprintf(stderr, "error occured releasing resource from context");
}
for(i=0;i<num_inputs;i++)
{
if (calResFree(inLocal) != CAL_RESULT_OK)fprintf(stderr, "error occured freeing inLocal %d", i);
}
for (i=0;i<num_outputs;i++)
{
if (calResFree(outLocal) != CAL_RESULT_OK)fprintf(stderr, "error occured freeing outLocal\n");
}
if(calCtxDestroy(ctx) != CAL_RESULT_OK) fprintf(stderr, "error occured");
calDeviceClose(device);
if(calShutdown() != CAL_RESULT_OK) fprintf(stderr, "error occured");
}
// Query the entry point in the module for the function “main”
CALfunc entry = 0;
if(calModuleGetEntry(&entry, ctx, module, "main"
) != CAL_RESULT_OK)
fprintf(stdout,
"error getting module entry point\n"
);
// Query the variable names for inName 0 and outName 0
CALname inName[MAX_INPUTS], outName[MAX_OUTPUTS];
CALchar paramName[10];
for
(i=0;i<num_inputs;i++)
{
sprintf_s(paramName,
"i%d"
, i);
inName = 0;
if
(calModuleGetName(&inName, ctx, module, paramName ) != CAL_RESULT_OK)
fprintf(stdout,
"error getting module name %s\n"
, paramName);
}
//if(calModuleGetName(&in2Name, ctx, module, "i0") != CAL_RESULT_OK)
// fprintf(stdout,"error getting module name i1\n");
for
(i=0;i<num_outputs;i++)
{
sprintf_s(paramName,
"o%d"
, i);
outName=0;
if
(calModuleGetName(&outName, ctx, module, paramName) != CAL_RESULT_OK)
fprintf(stdout,
"error getting module name %s\n"
, paramName);
}
// Bind resources to memory handles for this context
// ……………
for
(i=0;i<num_inputs;i++)
{
if
(calCtxSetMem(ctx, inName, inmem) != CAL_RESULT_OK)
fprintf(stdout,
"error setting context memory %s\n"
, paramName);
}
for
(i=0;i<num_outputs;i++)
{
if
(calCtxSetMem(ctx, outName, outmem) != CAL_RESULT_OK)
fprintf(stdout,
"error setting context memory %s\n"
, paramName);
}
// Setup the domain for execution
CALdomain domain = {0, 0, size, size};
// Event ID corresponding to the kernel invocation
CALevent event1 = 0;
CALresult calCtxError;
// Create Counter for CACHE HIT RATE
// Launch the CAL kernel on the given domain
int
timing_loop;
double
correction=0.0f;
double
first_timing=0.0f;
for
(timing_loop=0;timing_loop<2;timing_loop++)
{
duration=0.0f;
for
(i=0;i<kernel_loop;i++)
{
calCtxFlush(ctx);
start=clock();
//calCtxBeginCounterExt(ctx, cacheCounter);
calCtxError = calCtxRunProgram(&event1, ctx, entry, &domain);
//fprintf(stdout, "%s\n", calGetErrorString());
if
(calCtxError == CAL_RESULT_BAD_HANDLE)
fprintf(stdout,
"bad handle error running program\n"
);
if
(calCtxError == CAL_RESULT_ERROR)
fprintf(stdout,
"symbol error running context program\n"
);
// Wait on the event for kernel completion
while
(calCtxIsEventDone(ctx, event1) == CAL_RESULT_PENDING);
//cin.get();
//calCtxEndCounterExt(ctx, cacheCounter);
stop=clock();
duration+=(stop-start);
//calCtxGetCounterExt(&counter_result, ctx, cacheCounter);
//cout<<"Counter Result: "<<counter_result<<endl;
}
duration = duration/(
double
)CLOCKS_PER_SEC;
printf (
"%.10lf\n"
, duration );
fdata<<kernel_loop<<
"\t"<<setiosflags(ios::fixed)<<setprecision(10)<<duration<<"\t\t"
;
printf(
"%f\n"
, 1.0f/duration);
kernel_loop = (
int)((1.0f/duration)*(double
)kernel_loop);
printf(
"New Kernel Loop: %d\n"
, kernel_loop);
if
(timing_loop==0)
{
first_timing=duration;
}
else
{
correction=first_timing*duration;
fdata<<correction;
}
}
fdata<<endl;
//remap the resource for output
for
(i=0;i<num_outputs;i++)
{
outPtr = NULL;
if
(calResMap((CALvoid**)&outPtr, &pitch, outLocal, 0) != CAL_RESULT_OK)
fprintf(stderr,
"error occured mapping resource outLocal %d"
, i);
}
//print the memory
CALfloat *out1[MAX_OUTPUTS];
for
(i=0;i<num_outputs;i++)
{
for (unsigned int
k=0;k < size; k++)
{
out1 = &outPtr[k*pitch];
for (unsigned int
j=0;j<size;j++)
{
//printf("[%d][%d]: %f\n", k, j, out1
}
}
}
//unmap the resource for output
for
(i=0;i<num_outputs;i++)
{
if
(calResUnmap(outLocal) != CAL_RESULT_OK)
fprintf(stderr,
"error occured unmapping outLocal %d"
, i);
}
//unload module
calModuleUnload(ctx, module);
//free the image
calclFreeImage(img);
//free the object
calclFreeObject(obj);
//release the resource from the context
for
(i=0;i<num_inputs;i++)
{
if
(calCtxReleaseMem(ctx, inmem) != CAL_RESULT_OK)
fprintf(stderr,
"error occured releasing resource inmem %d from context"
, i);
}
for
(i=0;i<num_outputs;i++)
{
if
(calCtxReleaseMem(ctx, outmem) != CAL_RESULT_OK)
fprintf(stderr,
"error occured releasing resource from context"
);
}
// deallocate local resource
for
(i=0;i<num_inputs;i++)
{
if
(calResFree(inLocal) != CAL_RESULT_OK)
fprintf(stderr,
"error occured freeing inLocal %d"
, i);
}
for
(i=0;i<num_outputs;i++)
{
if
(calResFree(outLocal) != CAL_RESULT_OK)
fprintf(stderr,
"error occured freeing outLocal\n"
);
}
// Destroy the context
if
(calCtxDestroy(ctx) != CAL_RESULT_OK)
fprintf(stderr,
"error occured"
);
// Closing the device
calDeviceClose(device);
// Shutting down CAL
if
(calShutdown() != CAL_RESULT_OK)
fprintf(stderr,
"error occured"
);
//calCtxDestroyCounterExt(ctx, cacheCounter);
//cin.get();
}
I posted my code above.
I have tried this in other projects with NO SUCCESS.
Are there many others who are able to use these counters with success? If so, how? Help would be great!
I have my code for one project posted above.
I get the same runtime error for every project. Do I need to include something from the Samples or Timer files? That wouldn't make sense to me but it doesn't make sense to me that AMD requires you to replicate code to even be able to use the counters.
Sorry, I forgot that my problem is that I am unable to get the extensions, this will probably help someone know what the problem might be.
They are supported though, since my function doesn't return on that call
Well, I know it's not the code since I copy and pasted it into the simple_matmult example and it worked fine so it has to be something else, any ideas?
Interestingly, I tried the exact same code in compute_malmult and am getting the same runtime error (isnt getting the cal extensions though they are supported).
SO, I am now just curious what is different between the ships samples of simple_matmult and compute_matmult that is causing the perf counters to work in simple but not in compute??
This will at least help me narrow down my problem. I have looked through all the project and solution settings but have not found anything different.
any ideas AMD??
OK, first I'm sorry if you are subscribed and feel I'm spamming you unnecessarily... I probably am.
BUT in case anyone else has this problem.... I have found the solution to my own problem.
When setting up the extensions it seems as though you have to do it RIGHT BEFORE you call the create functions... this is the only thing that worked for me. To be honest, I'm sure someone could provide a more technical explanation for this but I'm not going to do that.