cancel
Showing results for 
Search instead for 
Did you mean: 

Discussions

ohshima
Journeyman III

OpenMP offloading + hipblas + XNACK+

I have oppotunity to use MI210. To ease programming, I want to use OpenMP offloading with hipblas and managed memory(XNACK+).

I succeeded to use hipblas and XNACK+, but I cannot right result with OpenMP offloading.

Is this usecase not suported?

I'm using rocm 6.1.0 on EPYC + MI210.

 

1: normal version (right result is obtained)

 

 

double *a, *b, *c;
double *da, *db, *dc;
a = (double*)malloc(sizeof(double)*n*n);
b = (double*)malloc(sizeof(double)*n*n);
c = (double*)malloc(sizeof(double)*n*n);
err = hipMalloc((void**)&da, sizeof(double)*n*n);
err = hipMalloc((void**)&db, sizeof(double)*n*n);
err = hipMalloc((void**)&dc, sizeof(double)*n*n);
/* set a, b, c (omit) */
err = hipMemcpy(da, a, sizeof(double)*n*n, hipMemcpyHostToDevice);
err = hipMemcpy(db, b, sizeof(double)*n*n, hipMemcpyHostToDevice);
err = hipMemcpy(dc, c, sizeof(double)*n*n, hipMemcpyHostToDevice);
hipblasDgemm(handle, HIPBLAS_OP_N, HIPBLAS_OP_N, n ,n, n, &alpha, da, n, db, n, &beta, dc, n);
err = hipMemcpy(c, dc, sizeof(double)*n*n, hipMemcpyDeviceToHost);
$ hipcc -I${ROCM_INCLUDE} -I${HIP_INCLUDE} -I${HIPBLAS_INCLUDE} -L${HIPBLAS_LIB} -fopenmp -o hipblas_c hipblas.cpp -lhipblas
$ ./hipblas_c

 

 

 

2: Managed version (right result is obtained)

 

 

double *a, *b, *c;
err = hipMallocManaged((void**)&a, sizeof(double)*n*n);
err = hipMallocManaged((void**)&b, sizeof(double)*n*n);
err = hipMallocManaged((void**)&c, sizeof(double)*n*n);
/* set a, b, c (omit) */
hipblasDgemm(handle, HIPBLAS_OP_N, HIPBLAS_OP_N, n ,n, n, &alpha, a, n, b, n, &beta, c, n);
$ hipcc -I${ROCM_INCLUDE} -I${HIP_INCLUDE} -I${HIPBLAS_INCLUDE} -L${HIPBLAS_LIB} -fopenmp -o hipblas_c hipblas.cpp -lhipblas
$ ./hipblas_c

 

 

 

3: XNACK+ version (right result is obtained)

 

 

double *a, *b, *c;
a = (double*)malloc(sizeof(double)*n*n);
b = (double*)malloc(sizeof(double)*n*n);
c = (double*)malloc(sizeof(double)*n*n);
/* set a, b, c (omit) */
hipblasDgemm(handle, HIPBLAS_OP_N, HIPBLAS_OP_N, n ,n, n, &alpha, a, n, b, n, &beta, c, n);
err = hipStreamSynchronize(hipStreamDefault);
$ hipcc -I${ROCM_INCLUDE} -I${HIP_INCLUDE} -I${HIPBLAS_INCLUDE} -L${HIPBLAS_LIB} -fopenmp -o hipblas_c hipblas.cpp -lhipblas
$ HSA_XNACK=1 ./hipblas_c

 

 

 

4: OpenMP offloading + hipblas + XNACK+ (0.0 is obtained)

 

 

double *a, *b, *c;
a = (double*)malloc(sizeof(double)*n*n);
b = (double*)malloc(sizeof(double)*n*n);
c = (double*)malloc(sizeof(double)*n*n);
/* set a, b, c (omit) */
#pragma omp target data use_device_ptr(a,b,c)
hipblasDgemm(handle, HIPBLAS_OP_N, HIPBLAS_OP_N, n ,n, n, &alpha, a, n, b, n, &beta, c, n);
#pragma omp taskwait

 

 

 

 

 

$ amdclang++ -I${ROCM_INCLUDE} -I${HIP_INCLUDE} -I${HIPBLAS_INCLUDE} -L${HIPBLAS_LIB} -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a:xnack+ -o hipblas_omp_c hipblas_omp.cpp -lhipblas
$ HSA_XNACK=1 ./hipblas_omp_c

 

 

 

0 Likes
0 Replies