cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

entity279
Adept II

Brook Code Leaks memory?

Ok, I have built my rather big app - an lvq classifier algorithm, it uses a small brook+ function (for now) which calls on two kernels. Problem is the whole project crashes if I run it outside the IDE (in release OR debug mode) and it only works in debug mode within the IDE (VS 2005).

The brook code itself is tested and works perfectly in debug or release mode while ofcourse, when commenting the brook call in the big project, it also works.

The crash is related with corrupted heap (invalid heap pointer). Now, I've been doing step by step debug in order to double check all memory allocations and free()s and I've came to this loop:

===============================================


 for (i=0;i


     for (j=0;j    {


    //obtin fiecare codebook cevtor in parte
     tmp_vector=this->data->getVect(i,j);


     if (tmp_vector==NULL)
     {
          printf("\nEroare neasteptata!!");
          skipped++;
          continue;
     }


     dist_wrapper(image,tmp_vector,height,width,&cur_dist);
     //cur_dist=0;


     if (cur_dist     {
            min_dist=cur_dist;
           *nrC=i;
           *nrV=j;
     }

}
==================================================

     Well, nothing is allocated here and this is still in a .cpp file. The brook+ call is ofcourse dist_wrapper(). image, Tmp_vector are float *, each of 784 elements. Both are already allocated

- tmp_vector's adress is received through getVect() call

- image is a parameter of the enclosing function , coming from an already allocated, big, float ** structure.

      All the other dist_wrapper parameters are integeres.

 

     Before I get to this loop, tot mem usage of the app is around 192,320 KB. Ofcourse, while looping, the usage shouldn't increase, right? Now the thing is, once per 4-6 calls  (should be more than 6 calls, actually) of distance_wrapper memory usage increases by 4KB. And the loop is run 50 times, but the enclosing function is run 60,000 times.  Now for example, after 23,488 runs of the enclosing function i have 690,044 KB mem used. Ok, so let's see distance_wrapper:

======================================================

void dist_wrapper(float *img1, float *img2, const int height, const int                                       width, float *distance)
{
  uint img_lenght = height * width, k = 0, j = 0, i = 0;
  float result;

  {
  ::brook::stream res(::brook::getStreamType(( float *)0), img_lenght,-1), sample2(::brook::getStreamType(( float *)0), img_lenght,-1), sample1(::brook::getStreamType(( float *)0), img_lenght,-1);
  ::brook::stream dist(::brook::getStreamType(( float *)0), 1,-1);

  streamRead(sample1, img1); //mem usage increases sometimes by 4Kb here
  streamRead(sample2, img2);
  construct(sample1, sample2, res);
  distance1(res, dist);
  streamWrite(dist, &result);
  }

  *distance = result;
}
===================================================

 -img_lenght is, as i said, 784

 -construct does a simple computation: (input1-input2)*(input1-input2)

 -distance1 is a reduce kernel, summs all 784 floats in input

 -the memory usage increases after the first streamRead

 

So

1. I would appreciate any suggestion why this is happening, or a solution to this (ofcourse )

2. I assumed the problem is brook related, but it might be my own code there somewhere (although I am writing this post being convinced that it isn't). Also I am not familiar at all  with the way Visual studio handles debugging, and i think it also might be a VS issue/"feature". I am therefore sorry if my problem isn't brook related

3. Thank you for the time you took reading this

 

 

 

 

0 Likes
7 Replies
Ceq
Journeyman III

How many iterations does it performs before aborting? (outside and within the IDE)
0 Likes

Whithin the IDE, it performs the full cycle of 60 000*50 iterarations. Ofcourse, because it's a classifier algorithm, all the data is reprocessed many times within the learning process. Since the debug mode inside the IDE is hoplessly slow (it took 11 000+ seconds to complete the full cycle I was talking about earlier- note that I'm using the cpu mode, I only have an x800 on this computer) I've only ran the program for about 1,5 cycles. But yeah, after that time it did crashed in a way - meaning that something made my browser crash and after that my program which was running in the background at the time didn't use any processor anymore. And I had to close the IDE because it became nonresponsive. But I'm betting this is just an exception and I believe the program would normally run till the end (doing cycle after cycle) or (more probable) untill the end of the memory .

I haven't checked the number of iterations outside the IDE, it crashes almost instantly though. I'll count them though and report here after that..

 

0 Likes

I'm really sorry that I have accidently mislead you. It apears that the debug version does indeed also work outside the IDE (as it would be to expect..). Ofcourse, it still fills up memory. I' m running it now and i'll tell you, if it stops, how many iterations will it pass (I'll edit this post).

 

The release version crashes without doing any iteration, probably just when it reaches brook code (as i mentioned, with it disabled it does run perfectly).

0 Likes
Ceq
Journeyman III

Well, I'm more curious about the release version, in VS2005 you can launch debug mode with release versions. Could you check the exact point where it crashes?
0 Likes

Unhandled exception at 0x00000001400044e5 in lvq_gpu.exe: 0xC0000005: Access violation reading location 0xffffffffffffffff.

 

Crashes here:

::brook::StreamInterface *arg_v2 = (::brook::StreamInterface *) args[1]

 

in code ( marked it with a (1) 😞

 

void __construct_cpu(::brook::Kernel *__k, const std::vector&args, int __brt_idxstart, int __brt_idxend, bool __brt_isreduce)
{
  ::brook::StreamInterface *arg_v1 = (::brook::StreamInterface *) args[0];(1)::brook::StreamInterface *arg_v2 = (::brook::StreamInterface *) args[1];
  ::brook::StreamInterface *arg_output = (::brook::StreamInterface *) args[2];
 
  for(int __brt_idx=__brt_idxstart; __brt_idx<__brt_idxend; __brt_idx++) {
  Addressable <__BrtFloat1 > __out_arg_output((__BrtFloat1 *) __k->FetchElem(arg_output, __brt_idx));
  __construct_cpu_inner (
  Addressable <__BrtFloat1 >((__BrtFloat1 *) __k->FetchElem(arg_v1, __brt_idx)),
  Addressable <__BrtFloat1 >((__BrtFloat1 *) __k->FetchElem(arg_v2, __brt_idx)),
  __out_arg_output);
  *reinterpret_cast<__BrtFloat1 *>(__out_arg_output.address) = __out_arg_output.castToArg(*reinterpret_cast<__BrtFloat1 *>(__out_arg_output.address));
  }
}

 

Later on I will making the debug version as.. "release as possible". Maybe a compiler option helps crashing the whole thing

0 Likes

entity279,

Can you check again with the SDK 1.3 that is coming out in a few weeks, and if it is still an issue, let us know at streamdeveloper@amd.com?

thanks.

0 Likes

I am anxious to do it .. However, as much as the memory leaking is an issue, it isn't as severe as I first thought - the memory is used without an apparent reason once for each few brook calls, but it does somehow also gets freed after some time. Therefore, it manages to stay at 300-450Mb during program utilisation thoiugh I've also seen 700Mb sometimes, but it doesn't go beyond. (As i mentioned earlier, the program uses about 190Mb of memory  before calling Brook)

For the release version however, I am definitlly clueless why it won't run at all and can't do anything about it - even the smallest Brook call (exept for streamRead and streamWrite) would crash the program, this includes a 0 input, single output kernel that only returns zeroes.

 

Update: fixed my release problem. I had previously compiled and tested all brook kernels starting from the hello brook solution. Oddly enough, once I've added  all my .cpp's to the ex-hello brook solution, this time the compiled exe didn't crash on release either.

It's sad that I don't see any difference between the 2 .br files or between the 2 project settings so i really have no ideea about wat was wrong in the first place.

Also note that the realease version is also experiencing the same increasing memory usage while running ...

 

 

0 Likes