cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

fiery
Journeyman III

Stream (CAL) lockup after enumerating OpenCL devices

In our software (Lavalys EVEREST) we display device properties for both Stream (CAL) devices and OpenCL devices.  If we do them separately, both work just fine.  If we enumerate CAL devices, and then enumerate OpenCL devices, they both work fine.  If we enumerate CAL devices, then enumerate OpenCL devices, and then open a CAL device to acquire calDeviceGetStatus, calDeviceOpen locks up the calling thread.

Is there a known conflict between opening the same device using CAL and OpenCL?  After opening the device with either interface, we get its status/properties, and then close the device and do a cleanup properly.  We dynamically load both DLLs, when they're needed.

Do we do something wrong?  Or is there a known limitation of these interfaces that we should take into account?

Thanks,
Fiery

0 Likes
7 Replies
fiery
Journeyman III

Some more findings...

When we call dynamically load CAL DLL, call calInit, do some job, then finally call calShutdown, and unload CAL DLL, it all works well.  When we do that process a couple of times (in a loop), it works as expected everytime.

However, if we do the following:

1) Load CAL DLL
2) calInit
3) *some work*
4) calShutdown
5) Unload CAL DLL
6) Load OpenCL DLL
7) clGetPlatformIDs
😎 Unload OpenCL DLL
9) Load CAL DLL
10) calInit returns CAL_RESULT_ALREADY (instead of the usual CAL_RESULT_OK)

Maybe OpenCL DLL doesn't free up all resources and free the GPU device (that we don't even open or query or enumerate via OpenCL), and so after unloading OpenCL, CAL will not be able to use the GPU device?

0 Likes

Originally posted by: fiery Some more findings...

 

When we call dynamically load CAL DLL, call calInit, do some job, then finally call calShutdown, and unload CAL DLL, it all works well.  When we do that process a couple of times (in a loop), it works as expected everytime.

 

However, if we do the following:

 

1) Load CAL DLL 2) calInit 3) *some work* 4) calShutdown 5) Unload CAL DLL 6) Load OpenCL DLL 7) clGetPlatformIDs 😎 Unload OpenCL DLL 9) Load CAL DLL 10) calInit returns CAL_RESULT_ALREADY (instead of the usual CAL_RESULT_OK) Maybe OpenCL DLL doesn't free up all resources and free the GPU device (that we don't even open or query or enumerate via OpenCL), and so after unloading OpenCL, CAL will not be able to use the GPU device?

 

I feel you can use device even if calInit returns CAL_RESULT_ALREADY.

What do you mean by unloading OpenCL.dll? I feel releasing context should release all devices assosiated with that context.

Could you please past your code here which shows this issue?

0 Likes

If we enumerate CAL devices, then enumerate OpenCL devices, and then open a CAL device to acquire calDeviceGetStatus, calDeviceOpen locks up the calling thread.


Looks like a bug to me.

What do you mean by unloading OpenCL.dll? I feel releasing context should release all devices assosiated with that context.


According to the workflow, he didn't even create the OpenCL context. He meant by unloading that OpenCL.dll is no longer used in process.

 

 

0 Likes

Okay, here's the code.  It will be a bit long, so I'll split it to multiple posts.

Function StreamDevices(Var tsl:TStringList):Boolean; Var calrtdll : THandle; d,d_count : DWord; calrtdll_filename : String; tcdi : TCALdeviceinfo; calinit : TcalInit; calshutdown : TcalShutdown; caldevicegetcount : TcalDeviceGetCount; caldevicegetinfo : TcalDeviceGetInfo; Begin Result:=False; calrtdll_filename:='aticalrt.dll'; calrtdll:=LoadLibrary(PChar(calrtdll_filename)); If calrtdll=0 Then Begin calrtdll_filename:='amdcalrt.dll'; calrtdll:=LoadLibrary(PChar(calrtdll_filename)); If calrtdll=0 Then Exit; End; @calinit :=GetProcAddress(calrtdll,'calInit'); @calshutdown :=GetProcAddress(calrtdll,'calShutdown'); @caldevicegetcount:=GetProcAddress(calrtdll,'calDeviceGetCount'); @caldevicegetinfo :=GetProcAddress(calrtdll,'calDeviceGetInfo'); If (@calinit=Nil) Or (@calshutdown=Nil) Or (@caldevicegetcount=Nil) Or (@caldevicegetinfo=Nil) Then Begin FreeLibrary(calrtdll); Exit; End; Case calInit Of CAL_RESULT_OK, CAL_RESULT_ALREADY : Begin If calDeviceGetCount(d_count)=CAL_RESULT_OK Then If d_count>0 Then Begin Result:=True; For d:=0 To d_count-1 Do Begin FillChar(tcdi,SizeOf(tcdi),#0); If calDeviceGetInfo(tcdi,d)=CAL_RESULT_OK Then tsl.Add(StreamTargetString(tcdi.target,d)); End; End; calShutdown; End; End; FreeLibrary(calrtdll); End; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Function OpenCLDevices(Var tsl:TStringList):Boolean; Var opencldll : THandle; d1,d2,d_devcount,d_platcount,d_ret : DWord; bu : Array [0..2047] Of Char; bu_plat : Array [0..255] Of DWord; bu_dev : Array [0..255] Of DWord; clgetplatformids : TclGetPlatformIDs; clgetdeviceids : TclGetDeviceIDs; clgetdeviceinfo : TclGetDeviceInfo; Begin Result:=False; opencldll:=LoadLibrary(opencldll_filename); If opencldll=0 Then Exit; @clgetplatformids:=GetProcAddress(opencldll,'clGetPlatformIDs'); @clgetdeviceids :=GetProcAddress(opencldll,'clGetDeviceIDs'); @clgetdeviceinfo :=GetProcAddress(opencldll,'clGetDeviceInfo'); If (@clgetplatformids=Nil) Or (@clgetdeviceids=Nil) Or (@clgetdeviceinfo=Nil) Then Begin FreeLibrary(opencldll); Exit; End; FillChar(bu_plat,SizeOf(bu_plat),#0); If clGetPlatformIDs(High(bu_plat),@bu_plat,d_platcount)=CL_SUCCESS Then For d1:=0 To d_platcount-1 Do Begin FillChar(bu_dev,SizeOf(bu_dev),#0); If clGetDeviceIDs(bu_plat[d1],CL_DEVICE_TYPE_ALL,High(bu_dev),@bu_dev,d_devcount)=CL_SUCCESS Then For d2:=0 To d_devcount-1 Do Try FillChar(bu,SizeOf(bu),#0); If clGetDeviceInfo(bu_dev[d2],CL_DEVICE_NAME,SizeOf(bu),@bu,d_ret)=CL_SUCCESS Then tsl.Add(EncodeXMLTagDWord('PLAT',bu_plat[d1])+ EncodeXMLTagDWord('DEV' ,bu_dev [d2])+ EncodeXMLTagStr ('DESC',StrPas(@bu))); Except End; End; Result:=tsl.Count>0; Try FreeLibrary(opencldll); Except End; End;

0 Likes

Here's the 3rd function.  We have a 4th (OpenCLDeviceInfo), but it's not relevant about this issue. Even when we disable OpenCLDeviceInfo and only call OpenCLDevices to detect only basic info about OpenCL devices, the thread lockup in StreamDeviceInfo still occurs.

Function StreamDeviceInfo(d_dev:DWord;Var sdir:StreamDeviceInfoRec):Boolean; Var calrtdll : THandle; d_count,d_imp,d_major,d_minor : DWord; i_verc : Integer; s_dll,s_ver,calrtdll_filename : String; tcd : TCALdevice; tcda : TCALdeviceattribs; tcdi : TCALdeviceinfo; tcds : TCALdevicestatus; calinit : TcalInit; calshutdown : TcalShutdown; caldeviceopen : TcalDeviceOpen; caldeviceclose : TcalDeviceClose; caldevicegetcount : TcalDeviceGetCount; caldevicegetinfo : TcalDeviceGetInfo; caldevicegetattribs : TcalDeviceGetAttribs; caldevicegetstatus : TcalDeviceGetStatus; calgetversion : TcalGetVersion; calextsupported : TcalExtSupported; calextgetversion : TcalExtGetVersion; Begin Result:=False; FillChar(sdir,SizeOf(sdir),#0); calrtdll_filename:='aticalrt.dll'; calrtdll:=LoadLibrary(PChar(calrtdll_filename)); If calrtdll=0 Then Begin calrtdll_filename:='amdcalrt.dll'; calrtdll:=LoadLibrary(PChar(calrtdll_filename)); If calrtdll=0 Then Exit; End; @calinit :=GetProcAddress(calrtdll,'calInit'); @calshutdown :=GetProcAddress(calrtdll,'calShutdown'); @caldeviceopen :=GetProcAddress(calrtdll,'calDeviceOpen'); @caldeviceclose :=GetProcAddress(calrtdll,'calDeviceClose'); @caldevicegetcount :=GetProcAddress(calrtdll,'calDeviceGetCount'); @caldevicegetinfo :=GetProcAddress(calrtdll,'calDeviceGetInfo'); @caldevicegetattribs:=GetProcAddress(calrtdll,'calDeviceGetAttribs'); @caldevicegetstatus :=GetProcAddress(calrtdll,'calDeviceGetStatus'); @calgetversion :=GetProcAddress(calrtdll,'calGetVersion'); @calextsupported :=GetProcAddress(calrtdll,'calExtSupported'); @calextgetversion :=GetProcAddress(calrtdll,'calExtGetVersion'); If (@calinit=Nil) Or (@calshutdown=Nil) Or (@caldeviceopen=Nil) Or (@caldeviceclose=Nil) Or (@caldevicegetcount=Nil) Or (@caldevicegetinfo=Nil) Or (@caldevicegetattribs=Nil) Or (@caldevicegetstatus=Nil) Or (@calgetversion=Nil) Or (@calextsupported=Nil) Or (@calextgetversion=Nil) Then Begin FreeLibrary(calrtdll); Exit; End; FillChar(tcdi,SizeOf(tcdi),#0); FillChar(tcda,SizeOf(tcda),#0); FillChar(tcds,SizeOf(tcds),#0); tcds.struct_size:=SizeOf(tcds); Case calInit Of CAL_RESULT_OK, CAL_RESULT_ALREADY : Begin If calDeviceGetCount(d_count)=CAL_RESULT_OK Then If d_count>d_dev Then If calDeviceGetInfo(tcdi,d_dev)=CAL_RESULT_OK Then Begin i_verc:=0; tcda.struct_size:=SizeOf(TCALdeviceattribs200); If calDeviceGetAttribs(tcda,d_dev)=CAL_RESULT_OK Then i_verc:=200 Else Begin tcda.struct_size:=SizeOf(TCALdeviceattribs140); If calDeviceGetAttribs(tcda,d_dev)=CAL_RESULT_OK Then i_verc:=140 Else Begin tcda.struct_size:=SizeOf(TCALdeviceattribs120); If calDeviceGetAttribs(tcda,d_dev)=CAL_RESULT_OK Then i_verc:=120 Else Begin tcda.struct_size:=SizeOf(TCALdeviceattribs100); If calDeviceGetAttribs(tcda,d_dev)=CAL_RESULT_OK Then i_verc:=100 End; End; End; If i_verc<>0 Then If calDeviceOpen(tcd,d_dev)=CAL_RESULT_OK Then If calDeviceGetStatus(tcds,tcd)=CAL_RESULT_OK Then If calDeviceClose(tcd)=CAL_RESULT_OK Then With sdir Do Begin Result:=True; s_dll:=calrtdll_filename; s_ver:=DetectFileVersion(IncludeTrailingPathDelimiter(OSSystemDir)+calrtdll_filename); If s_ver<>'' Then s_dll:=s_dll+' ('+s_ver+')'; CAL_DLL:=s_dll; DeviceName:=StreamTargetString(tcdi.target,d_dev); MaxResource1DWidth :=tcdi.maxResource1DWidth; MaxResource2DWidth :=tcdi.maxResource2DWidth; MaxResource2DHeight:=tcdi.maxResource2DHeight; TotalLocalMemory :=tcda.localRAM; TotalUncachedRemoteMemory:=tcda.uncachedRemoteRAM; TotalCachedRemoteMemory :=tcda.cachedRemoteRAM; GPUClock :=tcda.engineClock; MemoryClock :=tcda.memoryClock; If i_verc>=120 Then Begin WavefrontSize :=tcda.wavefrontSize; SIMDs :=tcda.numberOfSIMD; PitchAlignment :=tcda.pitch_alignment; SurfaceAlignment:=tcda.surface_alignment; IsItDoublePrecisionSupported:=tcda.doublePrecision; IsItLocalDataShareSupported :=tcda.localDataShare; IsItGlobalDataShareSupported:=tcda.globalDataShare; IsItGlobalGPRSupported :=tcda.globalGPR; IsItComputeShaderSupported :=tcda.computeShader; IsItMemoryExportSupported :=tcda.memExport; End; If i_verc>=140 Then UAVs:=tcda.numberOfUAVs; If i_verc>=200 Then Begin ShaderEngines:=tcda.numberOfShaderEngines; IsIt3DProgramGridSupported:=tcda.b3dProgramGrid; End; FreeLocalMemory :=tcds.availLocalRAM; FreeUncachedRemoteMemory:=tcds.availUncachedRemoteRAM; FreeCachedRemoteMemory :=tcds.availCachedRemoteRAM; d_major:=0; d_minor:=0; d_imp :=0; If calGetVersion(d_major,d_minor,d_imp)=CAL_RESULT_OK Then CALVersion:=Format('%d.%d.%d',[d_major,d_minor,d_imp]); If calExtSupported(CAL_EXT_D3D9)=CAL_RESULT_OK Then Begin Ext_D3D9Interaction_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_D3D9)=CAL_RESULT_OK Then Ext_D3D9Interaction_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_OPENGL)=CAL_RESULT_OK Then Begin Ext_OpenGLInteraction_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_OPENGL)=CAL_RESULT_OK Then Ext_OpenGLInteraction_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_D3D10)=CAL_RESULT_OK Then Begin Ext_D3D10Interaction_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_D3D10)=CAL_RESULT_OK Then Ext_D3D10Interaction_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_COUNTERS)=CAL_RESULT_OK Then Begin Ext_Counters_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_COUNTERS)=CAL_RESULT_OK Then Ext_Counters_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_DOMAIN_PARAMS)=CAL_RESULT_OK Then Begin Ext_DomainParameters_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_DOMAIN_PARAMS)=CAL_RESULT_OK Then Ext_DomainParameters_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_RES_CREATE)=CAL_RESULT_OK Then Begin Ext_CreateResource_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_RES_CREATE)=CAL_RESULT_OK Then Ext_CreateResource_Version:=Format('%d.%d',[d_major,d_minor]); End; If calExtSupported(CAL_EXT_COMPUTE_SHADER)=CAL_RESULT_OK Then Begin Ext_ComputeShader_Supported:=True; d_major:=0; d_minor:=0; If calExtGetVersion(d_major,d_minor,CAL_EXT_COMPUTE_SHADER)=CAL_RESULT_OK Then Ext_ComputeShader_Version:=Format('%d.%d',[d_major,d_minor]); End; End; End; calShutdown; End; End; FreeLibrary(calrtdll); End;

0 Likes

So what we do basically is:

1) Call StreamDevices to enumerate all CAL devices, but only basic info about them

2) Call OpenCLDevices to enumerate all OpenCL devices, but only basic info

3) When the user clicks on each of the devices, we call either StreamDeviceInfo or OpenCLDeviceInfo

 

The problem comes up in StreamDeviceInfo, the thread locks up at the following line:

        If calDeviceOpen(tcd,d_dev)=CAL_RESULT_OK Then

 

IMHO in StreamDeviceInfo CAL_RESULT_ALREADY shouldn't be returned by calInit, since we unloaded the CAL DLL before.  The result of calInit is CAL_RESULT_ALREADY only in the case between StreamDevices and StreamDeviceInfo we call OpenCLDevices.  If we call StreamDeviceInfo right after StreamDevices, then calInit returns CAL_RESULT_OK and the thread doesn't lock up.

There's a hunch: calling OpenCLDevices (loading OpenCL DLL) somehow creates a CAL context that is not properly freed when we unload OpenCL DLL at the end of OpenCLDevices.

 

As you can see, at the beginning of each of those functions we load the necessary DLL, and at the end of each function we completely unload the loaded DLL.  We don't create any actual contexts, since we only detect device features/properties, we do not do any actual GPU-based calculations.

 

Thank you for looking into this.

0 Likes
fiery
Journeyman III

Any comments, please?

0 Likes