NVidia introduced two instructions that have massive importance for cryptographic and integer compute with their Maxwell architecture:

- LOP3.LUT which lets applications execute FPGA style 32-bit lookup tables. This lets you execute any 3-input logic operation such as SHA-256's Ch (chose/bitselect), SHA-256's Maj (majority), SHA3's Chi, bitslice S-Boxes, etc in a single op. While AMD does have the bitselect operation (which can be used to compose the general LUT operation), this is a vastly inferior option.

- IADD3 which lets applications add 3 32-bit integers. Simple and effective, but widely applicable due to functions like SHA-256 which have long 32-bit adder chains.

Are we getting anything like this with Arctic Islands? We'd like to get a head start on optimizing assembly if possible.

I second OP. LOP3.LUT would be a great addition for tripcode generators, or any applications that make extensive use of bitwise logical operations.