9 Replies Latest reply on Jun 23, 2016 11:49 AM by optimiz3

    Are AMD GPUs getting an equivalent for NVidia's LOP3.LUT?


      NVidia introduced two instructions that have massive importance for cryptographic and integer compute with their Maxwell architecture:


      - LOP3.LUT which lets applications execute FPGA style 32-bit lookup tables.  This lets you execute any 3-input logic operation such as SHA-256's Ch (chose/bitselect), SHA-256's Maj (majority), SHA3's Chi, bitslice S-Boxes, etc in a single op.  While AMD does have the bitselect operation (which can be used to compose the general LUT operation), this is a vastly inferior option.


      - IADD3 which lets applications add 3 32-bit integers.  Simple and effective, but widely applicable due to functions like SHA-256 which have long 32-bit adder chains.


      Are we getting anything like this with Arctic Islands?  We'd like to get a head start on optimizing assembly if possible.