OpenCL

infovel · ‎10-01-2020

A while ago, I developed a kernel that runs fine on the 5850 card.

When I had an opportunity to test it on the newer 7770 and 7850 cards, it did not work. The result was wrong, it messed up the mandelbrot set, even though it was somewhat recognizable.

Now, I can also test it on the APU with 6530 and on 6450 and it does not work as well even after making adjustments for the lack of 64bit support. It did not work at all, it zeroed the output buffer contents. The modified kernel still works fine on 5850.

I am using AMD OpenCL SDK 3.0.130.136-GA under both Linux and Windows.

First of all, which AMD cards support cl_khr_fp64 and cl_amd_fp64 and which cards do not?

And why would a kernel that runs fine on 5850 fail on the 7000 series cards that support cl_khr_fp64 extension?

dipak · ‎10-12-2020

As I discussed with the OpenCL team, it seems that almost every currently supported devices have double-precision support. AMD OpenCL optimization guide also says that "All GCN GPUs have double-precision support".

If the HW doesn’t have FP64 support, it's most likely a legacy pre-GCN device. Many of the pre-GCN devices don't have double-precision support. From the optimization guide, it looks like, under HD6xxx series, only HD69xx devices have FP64 support ["Chapter 3: OpenCL Performance and Optimization for Evergreen and Northern Islands -> section 3.8.1 Instruction Bandwidths"].

Also, as suggested by the OpenCL team, related ISA manual can be referred to know if the HW has double-precision support.

Thanks.

View solution in original post

infovel · ‎10-01-2020

The site totally butchered the code snippets, so here they are in at least readable format.

static const char *kernelSource =                                    "\n" \
"#pragma OPENCL EXTENSION cl_khr_fp64 : enable                        \n" \
"__kernel void mbCalc(                                                \n" \
"   __global uint *v,                                                 \n" \
"   const double _x0,                                                 \n" \
"   const double _y0,                                                 \n" \
"   const double _sizeCbyP,                                           \n" \
"   const uint _iterate)                                              \n" \
"{                                                                    \n" \
"    const uint id = get_global_id(0); //Get our global thread ID     \n" \
"                                                                     \n" \
"    double zreal = 0.0,                                              \n" \
"           zimag = 0.0;                                              \n" \
"    const double creal = _x0 + _sizeCbyP * (id / 1000),              \n" \
"                 cimag = _y0 + _sizeCbyP * (id % 1000);              \n" \
"                                                                     \n" \
"    uint val = 0;                                                    \n" \
"    double _zreal;                                                   \n" \
"                                                                     \n" \
"    while (true) {                                                   \n" \
"        if (val >= _iterate || zimag > 50.0 || zreal > 50.0)         \n" \
"            break;                                                   \n" \
"        _zreal = zreal * zreal + creal - zimag * zimag;              \n" \
"        zimag = 2.0 * zreal * zimag + cimag;                         \n" \
"        zreal = _zreal;                                              \n" \
"        val++;                                                       \n" \
"    }                                                                \n" \
"    v[id] = val;                                                     \n" \
"}                                                                    \n" ;

static const char *kernelSource32 =                                  "\n" \
"__kernel void mbCalc(                                                \n" \
"   __global uint *v,                                                 \n" \
"   const float _x0,                                                  \n" \
"   const float _y0,                                                  \n" \
"   const float _sizeCbyP,                                            \n" \
"   const uint _iterate)                                              \n" \
"{                                                                    \n" \
"    const uint id = get_global_id(0); //Get our global thread ID     \n" \
"                                                                     \n" \
"    float zreal = 0.0,                                               \n" \
"          zimag = 0.0;                                               \n" \
"    const float creal = _x0 + _sizeCbyP * (id / 1000),               \n" \
"                cimag = _y0 + _sizeCbyP * (id % 1000);               \n" \
"                                                                     \n" \
"    uint val = 0;                                                    \n" \
"    float _zreal;                                                    \n" \
"                                                                     \n" \
"    while (true) {                                                   \n" \
"        if (val >= _iterate || zimag > 50.0 || zreal > 50.0)         \n" \
"            break;                                                   \n" \
"        _zreal = zreal * zreal + creal - zimag * zimag;              \n" \
"        zimag = 2.0 * zreal * zimag + cimag;                         \n" \
"        zreal = _zreal;                                              \n" \
"        val++;                                                       \n" \
"    }                                                                \n" \
"    v[id] = val;                                                     \n" \
"}                                                                    \n" ;

infovel · ‎10-02-2020

Anyone? I thought that anyone at AMD (who should be visiting these forums, since it is your own) should be able to point me at manuals. Considering the game of leapfrog that model numbers engage into, it is extremely tough for developer to know which cards they should or should not target. Are you not interested in increased adoption? If you are, you should lend a helping hand. Come on!

dipak · ‎10-05-2020

Thank you for reporting this.

Without looking into the detail, it's difficult to say why you are getting the different result. Architecture wise, HD 5850 and HD7770/7850 are very different, so this could a reason. Anyway, please provide a complete reproducible test-case (with host code) and attach the clinfo out. [To attach a file or for more editor options, please click the "Use advanced editor" option]

Please note that HD 5xxx or HD 6xxx cards (i.e. pre-GCN) are legacy products and no further support is expected for these products [https://www.amd.com/en/support/kb/faq/gpu-630 ].

Thanks.

infovel · ‎10-05-2020

Support for legacy products is a moot issue here. It may not be expected from you but it is expected from me and my kernels. A6 APUs are still around in abundance. This is why I asked for documentation that covers which GPUs support 64 bit floating point. Can we have this documentation? Right now I am having to piece it together from the document called AMD_OpenCL_Programming_User_Guide2.pdf which is missing quite a few model numbers, especially in the 6000 series. I surmise that APU with 6000 series do not support FP 64, but it does not necessarily mean that the entire 6000 series does not.

dipak · ‎10-12-2020

As I discussed with the OpenCL team, it seems that almost every currently supported devices have double-precision support. AMD OpenCL optimization guide also says that "All GCN GPUs have double-precision support".

If the HW doesn’t have FP64 support, it's most likely a legacy pre-GCN device. Many of the pre-GCN devices don't have double-precision support. From the optimization guide, it looks like, under HD6xxx series, only HD69xx devices have FP64 support ["Chapter 3: OpenCL Performance and Optimization for Evergreen and Northern Islands -> section 3.8.1 Instruction Bandwidths"].

Also, as suggested by the OpenCL team, related ISA manual can be referred to know if the HW has double-precision support.

Thanks.

bridgman · ‎10-27-2020

For the GPUs you mentioned, 5850, 7770 and 7850 all support FP64 in hardware.The 6950 also supported FP64 in hardware but IIRC the rest of the 6xxx series did not.

I'm pretty sure that the 6530 integrated graphics did not support FP64. I don't remember us making an APU with 6450 integrated graphics - was the 6450 perhaps a discrete GPU paired with the APU ?

Anyways, I'm not seeing an obvious pattern between chips with FP64 support in hardware and the success/failure of your kernel - probably makes most sense to focus on figuring out why your kernel isn't running on newer hardware.

The OpenCL devs are going to have a tough time supporting you on Llano hardware since we probably don't have access to the hardware any more - do you still have access to 7770/7850 or newer HW ?

OpenCL

Radeon HD5850 vs 7770, 7850, 6530(APU), or 6450