• Which AES Decryption algorithm is used by AMD64 ?

    Question 1 : In the AMD64 Architecture Programmer’s manual volume4:128 bit and 256 bit media instruction, The AESDEC instruction is defined to perform a single round of AES decryption which is explained in detai...
    ytpaul
    last modified by ytpaul
  • Using clsparseDdense2csr

    I'm trying to use clSPARSE pre-compiled. Tried creating a simple dense matrix and then converting it to sparse using clsparseDdense2csr. Code is more or less:   cldenseMatrix ADenseCl; cl::Buffer ADenseBuffer...
    et@et3d
    last modified by et@et3d
  • trouble getting ACML fftw wrapper working

    Hello,   I'm having trouble getting the ACML fftw wrapper working. I've linked ACML into a Windows 64-bit project today in VStudio 2012 with the intent of trying the FFTW wrappers on GPU.   The returned pl...
    gpgpucoder
    last modified by gpgpucoder
  • ACML/BLIS/libflame with K10 microarchitecture

    Hello everyone,   1) I am NOT a dev (of ACML etc), but this is where the ACML website sends me to ask questions. (The GitHub pages are not appropriate for my following questions, as I need to ask a common questi...
    pekirt
    last modified by pekirt
  • Small Matrix Multiplication

    Hello,   Most/all off-the-shelf routines for doing matrix-matrix multiplication are suitable for large matrices. The problems I am trying to run on a gpu (280X) involve large number - typically 200-300K - of r...
    dns.on.gpu
    last modified by dns.on.gpu
  • OpenCL Error in Octave using ACML 6.1 (under ArchLinux on Kaveri)

    Hello,   I am trying to use iGPU offloading under Octave 3.8.2 using the ACML 6.1 under Archlinux (Kernel 4.0) on a Kaveri (No discrete GPU) machine. I have installed the catalyst driver, acml and amdapp-sdk. Y...
    atcl
    last modified by atcl
  • how abort clAmdBlas Tune module

    Hi everybody, I just needed to run a Blas function (clAmdBlasiSamax) to get the index of the maximum value in a float array. Tested, worked fine, very happy, but...performance is the aim. Running CodeXL I saw from th...
    binghy
    last modified by binghy
  • Using clAmdBlas in java

    Hello, I'm new to OpenCL programming. I need parallel BLAS routines for my Java project. I've set up ADM APP SDK and JOCL OpenCL bindings. Code samples work well, but then I've tried to connect clAmdBlas and I don't u...
    oriojke
    last modified by oriojke
  • ACML 6.1 SGEMM and DGEMM bug in example code

    Please move this to AMD Compute Library forum after it's been approved.   I'm running ACML 6.1.0.31 compiled by gfortran for 64 bit Linux (available here: acml-6.1.0.31-gfortran64.tgz )   Running the time_...
    emartin
    last modified by emartin
  • Multiple devices

    Hello, I'd like to ask whether the Bolt can use multiple GPU devices for computation automatically? I have two GPUs and I'd like to distribute my algorithm on both of them. And similar question: Is it possible to for...
    petr.machacek
    last modified by petr.machacek
  • error in BLAS - ACML

    it seems old (bug), using ACML 5.3.1   > ## PR#4582 %*% with NAs > stopifnot(is.na(NA %*% 0), is.na(0 %*% NA)) > ## depended on the BLAS in use. > > > ## found from fallback test in slam...
    lejeczek
    last modified by lejeczek
  • clFFT multi-GPU with batch mode

    Hi!   What is the canonical way of getting multi-GPU processing of batched FFTs working? I tried using clFFT 2.4 from Github, and tried giving it multiple queues and buffers, but I got error code -4097 (CL_FFT_F...
    Meteorhead
    last modified by Meteorhead
  • clMath slows on CPU?

    Hi everyone, I was wondering if it's normal that clMath is mush slower than MKL when running on an intel CPU (about 10 times slower for a 5000*5000 complex double matrix)? I was told that the intel opencl uses the SSE...
    craucy
    last modified by craucy
  • BenchMarking CLFFT

    Hi,   I am trying to benchmark clamdfft on AMD radeon 7470M. Kindly comment on the method of benchmarking.   I call the FFT kernel like this, with event as cl_event.   err = clAmdFftEnqueueTransform(...
    karthik_hegde
    last modified by karthik_hegde
  • Problem with adding Bolt Sort in a simple opencl code

    Hi,   In my opencl code, i have added sort function from bolt. I am getting the following linker errors by just adding the sort line 'bolt::cl::sort( abc, abc+elements);' where abc is an array having 'elements' ...
    ifrah
    last modified by ifrah
  • sys freeze for large(?) array

    Hello.   I am trying various tests using ffts on a 3gb hd 7950. The speeds I seem to get are impressive - if the use of gettimeofday() can be relied upon to measure elapsed times.   When I try to make...
    dns.on.gpu
    last modified by dns.on.gpu
  • clAmdFft.runtime linking errors in Windows 8.1

    Howdy,   I'm writing a Java JNI wrapper for the AMD FFT library and have run into a problem under Windows 8.1 that doesn't show up in Windows 7.  The code works perfectly under Windows 7, but when I try t...
    amcelroy
    last modified by amcelroy
  • Getting Started

    Working on getting started using Bolt for parallel processing.    Want to use Linux - ie Ubuntu.   First question is - can Bolt achieve any parallelism on a system with a standard AMD CPU + GPU (eg F...
    dtison
    last modified by dtison
  • Trouble running opencl on ubuntu 13.10

    Hi,     I'm trying to run opencl on my setup without success:   $ /opt/clAmdBlas-1.10.321/samplebin/example_sgemv X server found. dri2 connection failed! Trying to open directly...Device open failed ...
    arto
    last modified by arto
  • clAmdFft 1.8.269 on multiple GPUs

    Hi,   I'm trying to use the clAmdFft (v1.8.269) in a Multi-GPU environment as part of a bigger project. That causes two troubles:   1. It seems the FFT calculates wrong values in Single Precision on Nvidia...
    reuter
    last modified by reuter