• Which AES Decryption algorithm is used by AMD64 ?

    Question 1 : In the AMD64 Architecture Programmer’s manual volume4:128 bit and 256 bit media instruction, The AESDEC instruction is defined to perform a single round of AES decryption which is explained in detai...
    last modified by ytpaul
  • Using clsparseDdense2csr

    I'm trying to use clSPARSE pre-compiled. Tried creating a simple dense matrix and then converting it to sparse using clsparseDdense2csr. Code is more or less:   cldenseMatrix ADenseCl; cl::Buffer ADenseBuffer...
    last modified by et@et3d
  • trouble getting ACML fftw wrapper working

    Hello,   I'm having trouble getting the ACML fftw wrapper working. I've linked ACML into a Windows 64-bit project today in VStudio 2012 with the intent of trying the FFTW wrappers on GPU.   The returned pl...
    last modified by gpgpucoder
  • ACML/BLIS/libflame with K10 microarchitecture

    Hello everyone,   1) I am NOT a dev (of ACML etc), but this is where the ACML website sends me to ask questions. (The GitHub pages are not appropriate for my following questions, as I need to ask a common questi...
    last modified by pekirt
  • Small Matrix Multiplication

    Hello,   Most/all off-the-shelf routines for doing matrix-matrix multiplication are suitable for large matrices. The problems I am trying to run on a gpu (280X) involve large number - typically 200-300K - of r...
    last modified by dns.on.gpu
  • OpenCL Error in Octave using ACML 6.1 (under ArchLinux on Kaveri)

    Hello,   I am trying to use iGPU offloading under Octave 3.8.2 using the ACML 6.1 under Archlinux (Kernel 4.0) on a Kaveri (No discrete GPU) machine. I have installed the catalyst driver, acml and amdapp-sdk. Y...
    last modified by atcl
  • how abort clAmdBlas Tune module

    Hi everybody, I just needed to run a Blas function (clAmdBlasiSamax) to get the index of the maximum value in a float array. Tested, worked fine, very happy, but...performance is the aim. Running CodeXL I saw from th...
    last modified by binghy
  • Using clAmdBlas in java

    Hello, I'm new to OpenCL programming. I need parallel BLAS routines for my Java project. I've set up ADM APP SDK and JOCL OpenCL bindings. Code samples work well, but then I've tried to connect clAmdBlas and I don't u...
    last modified by oriojke
  • ACML 6.1 SGEMM and DGEMM bug in example code

    Please move this to AMD Compute Library forum after it's been approved.   I'm running ACML compiled by gfortran for 64 bit Linux (available here: acml- )   Running the time_...
    last modified by emartin
  • Multiple devices

    Hello, I'd like to ask whether the Bolt can use multiple GPU devices for computation automatically? I have two GPUs and I'd like to distribute my algorithm on both of them. And similar question: Is it possible to for...
    last modified by petr.machacek
  • error in BLAS - ACML

    it seems old (bug), using ACML 5.3.1   > ## PR#4582 %*% with NAs > stopifnot(is.na(NA %*% 0), is.na(0 %*% NA)) > ## depended on the BLAS in use. > > > ## found from fallback test in slam...
    last modified by lejeczek
  • clFFT multi-GPU with batch mode

    Hi!   What is the canonical way of getting multi-GPU processing of batched FFTs working? I tried using clFFT 2.4 from Github, and tried giving it multiple queues and buffers, but I got error code -4097 (CL_FFT_F...
    last modified by Meteorhead
  • clMath slows on CPU?

    Hi everyone, I was wondering if it's normal that clMath is mush slower than MKL when running on an intel CPU (about 10 times slower for a 5000*5000 complex double matrix)? I was told that the intel opencl uses the SSE...
    last modified by craucy
  • BenchMarking CLFFT

    Hi,   I am trying to benchmark clamdfft on AMD radeon 7470M. Kindly comment on the method of benchmarking.   I call the FFT kernel like this, with event as cl_event.   err = clAmdFftEnqueueTransform(...
    last modified by karthik_hegde
  • Problem with adding Bolt Sort in a simple opencl code

    Hi,   In my opencl code, i have added sort function from bolt. I am getting the following linker errors by just adding the sort line 'bolt::cl::sort( abc, abc+elements);' where abc is an array having 'elements' ...
    last modified by ifrah
  • sys freeze for large(?) array

    Hello.   I am trying various tests using ffts on a 3gb hd 7950. The speeds I seem to get are impressive - if the use of gettimeofday() can be relied upon to measure elapsed times.   When I try to make...
    last modified by dns.on.gpu
  • clAmdFft.runtime linking errors in Windows 8.1

    Howdy,   I'm writing a Java JNI wrapper for the AMD FFT library and have run into a problem under Windows 8.1 that doesn't show up in Windows 7.  The code works perfectly under Windows 7, but when I try t...
    last modified by amcelroy
  • Getting Started

    Working on getting started using Bolt for parallel processing.    Want to use Linux - ie Ubuntu.   First question is - can Bolt achieve any parallelism on a system with a standard AMD CPU + GPU (eg F...
    last modified by dtison
  • Trouble running opencl on ubuntu 13.10

    Hi,     I'm trying to run opencl on my setup without success:   $ /opt/clAmdBlas-1.10.321/samplebin/example_sgemv X server found. dri2 connection failed! Trying to open directly...Device open failed ...
    last modified by arto
  • clAmdFft 1.8.269 on multiple GPUs

    Hi,   I'm trying to use the clAmdFft (v1.8.269) in a Multi-GPU environment as part of a bigger project. That causes two troubles:   1. It seems the FFT calculates wrong values in Single Precision on Nvidia...
    last modified by reuter