Archives Discussions

hummar · ‎09-16-2013

I am working on an industrial application that must be deterministic within a few microseconds and run a loop within 200us every 250us. My 16 core E5 based system is not fast enough so I am using a GPU to do the calculations. I would rather manage the GPU code from the Real Time system so my question... is there a way to run OpenCL on an R7990 from Win32 code where is not possible to install the standard Windows driver.

himanshu_gautam · ‎09-17-2013

Installing Catalyst driver is essential for OpenCL. But As i understand you are requesting for a real-time version of it.

Maybe the following link will be helpful: https://bbs.archlinux.org/viewtopic.php?id=52295

Thanks for the query. I will ask some relevant people and let you know.

scharupa · ‎09-17-2013

hi,

What do you mean by "standard Windows driver". Do you mean that AMD Catalyst driver?

For running opencl application on R7990, you have to install AMD Catalyst driver.

hummar · ‎09-17-2013

The driver I need does not require video support as video will be supplied via another video card. Video data transferring across the PCI bus would negatively affect the determinism of the application. Nvidia has a TCC driver for CUDA that does that but I think the ATI cards are better for this application and the TCC drive also requires Windows.

An open source driver written in ANSI C may work as I could just use the OpenCL relevant part of the code to make a custom driver.

himanshu_gautam · ‎09-17-2013

hummar wrote:

The driver I need does not require video support as video will be supplied via another video card. Video data transferring across the PCI bus would negatively affect the determinism of the application. Nvidia has a TCC driver for CUDA that does that but I think the ATI cards are better for this application and the TCC drive also requires Windows.

Can you explain about your intentions a bit more? I did not get, how having video support is a limiting feature.

FYI, NVIDIA has Tesla series, which is compute only cards. As I understand, AMD does not offer anything like that. So a TCC driver may not make much sense here.

hummar · ‎09-18-2013

Our current machine has up to 72 axis of motion in up to 20 coordinate systems that are distributed around the machine on dedicated circuit board stacks. The operator interface, communications, ladder logic etc. is done on a Windows 7 box that does not need to be deterministic or real time. We are working on a new version of the control that will move the code from the circuit board stacks into the Windows box. We are using Interval Zero's RTX to move 12 cores to the real time leaving 4 Windows cores to run the operator interface. The 12 cores can easily do 80+ axis in 250us or 500us (depending on the SERCOS 3 bus speed) so all is good so far. I have a test program that does about 200 million 64 bit additions per core per second in parallel and transfers the results via dual ported ram to the 4 Windows 7 cores.

The problem is that we want to do some very intensive calculations in real time in order to further improve the accuracy of our machine but the 12 cores are not nearly fast enough so I have been looking at ways to get more calculations done within the allotted time. The problem is that the GPU/APU should be controlled by the real time cores in order to make the code deterministic. This requires data to pass across the PCI-E buss. So, if I understand the GPU architecture properly the Win32 code that is running in RTX64 makes a call to an OpenCl ATI written dll that calls the driver that calls the GPU firmware. That is the only part of the driver I need and the only part of this system architecture I am missing.

I believe there are thousands of industrial applications that could be improved with this design as you can build an inexpensive Windows box to do the operator interface and then put the real time code in an relatively inexpensive GPU and then get an amazing amount of work done in short order.

himanshu_gautam · ‎10-09-2013

I am seeking answers to your questions. Please wait.

himanshu_gautam · ‎10-09-2013

Hey,

1) Have you tried installing Windows Catalyst driver first?

I have no idea if it will install on the RTX platform... but you can give a try and let us know.

2) AMD OpenCL runtime conflicts with Intel's

In the URL above, users have found success with driving display using Intel's integrated HD 4000 while using AMD Graphics card for computation.

Reading up these threads should really help you out....

Hope this helps,

Best Regards,

Bruhaspati

hummar · ‎10-09-2013

Hey Burhaspati, thanks for taking an interest in my project.

RTX is a real time extension to Windows and runs Win32 code but outside of the Windows environment. It removes resources from Windows like CPU cores and NICs during the boot process after which Windows will ignore them. It has it's own HAL and any drivers have to written specifically for it. I don't need much of a driver... just enough to run OpenCL in order to do some of the calculations on a GPU from RTX.

Here is a screen shot of a test program using 12 of the 16 cores on this machine, each core is running a 64 bit addition in an infinite loop. The Delta loops are the number of loops every 200ms and the other one is the total loops per core. You can get the total time by multiplying the HMI loops by .2 seconds. The variation in reporting delta loops is caused by the Windows form timer. I am guessing that the total number of loops is affected by accesses to main memory.

The form is running in Windows on CPU0 cores 0 through 3 and is communicating with RTX via a shared memory pointer.

CPU0:core4 is running the main task and the rest are threads spawned by core 4. Notice how the threads are slower on CPU0 and CPU1 is slower again. I don't know where the overhead is coming from as each thread has a core all to its self and should be running in L1 cache except for 1 access to shared memory every 1,000,000 loops.

if we can't get a GPU running from RTX we are considering trying a different MB or a dual processor 32 core Opteron setup to see if it has the same issue. Do you think CPU1 will run at the same speed as CPU0 on an Opteron? We are also wondering how the faster CPU speed of the Opteron and higher core count will compare to Intel's CPU/Motherboard architecture. What do you think?

Thanks again

Martin

himanshu_gautam · ‎10-10-2013

Hello Martin,

Getting a separate OpenCL access to the GPU without a driver for RTX is a distant dream.

You can almost forget it...

My personal view is that there is no incentive for AMD for creating such a special code for you - unless you are willing to pay few millions for it 😉

The cores might be sharing L1 cache. You may want to read-upon the cache architecture of the particular CPU you are using...

Also, if you are issuing instructions for locking the bus (LOCK prefix) (or) if you are using software Locks -- your speed will come down heavily... Always check a lock for availability, before attempting to lock (try locks)..

I dont know enough about Opteron and hence wont comment on it.

btw...

RTX architecture reminds me of a project I worked on earlier -- where we will make the Host OS (netbsd in our case) see less RAM and skip few PCIe slots -- which were later used by our application code (or guest OS, if you would call that) that would run on all other CPUs. We made the Host OS context switch into our OS whenever it had free time....and then go back whenever an interrupt comes up...

And so, we could get a proprietary box and a managemet console all bundled in a single machine.. We even had a GBDServer to debug our code...

- Bruhaspati

hummar · ‎10-30-2013

Hello Bruhaspati,

Thanks for your input. I was hoping to get ATI or Nvidia interested in the real time community. It was a long shot but worth a try. We are going to proceed with the conventional approach for now and revisit this at a later date.

- Martin

Archives Discussions

OpenCL Interval Zero RTX Driver