Archives Discussions

iLoop · ‎03-26-2010

Trying to speed up RSA using the GPU

Hi

Has someone tried to do modular multiplication using OpenCL?

Im trying to make a program that is able to compute C=m^e mod n (RSA encryption), but i need some help.

I hope someone has done this before and are willing to share there experiences with me.

So fare i have tried to use OpenSSL to generate the n value and this works fine, but i like to send this to the GPU to compute the C value in parallel

AndreasStahl · ‎03-26-2010

Hi iLoop, I fear we need more information to help you out. What is the exact problem you are facing? Could you share some of your source code?

And how do you want to parallelize the computation? What do you want to accomplish?

iLoop · ‎03-26-2010

Hi

Im trying to use OpenSSL to generate a RSA key pair like this:

RSA *rsa=NULL;

int RSAKeySize = 1024; ( like to be able to use 2048 and 4096 as well )

rsa = RSA_generate_key(RSAKeySize, 65537, NULL, NULL);

Then i need to send the rsa->n and rsa->e values to the GPU and use the values in encryption a message m ( m^e mod n).

I have found some samples as CUDA code put i hope someone is able to help me put it all together.

What i hope for is to be able to run this RSA encryption is parallel to speed up async encryption.

typedef unsigned char u8; typedef unsigned short u16; typedef unsigned int u32; typedef unsigned long long u64; // Compute the 2adic inverse of x // Simple Newton iteration, cf. exercise 9.12 of C&P 2nd ed. __kernel u32 inv2adic(u32 x) { u32 a; a = x; x = (((x+2)&4)<<1)+x; // correct to 4 bits x *= 2 - a*x; // 8 bits x *= 2 - a*x; // 16 bits x *= 2 - a*x; // 32 bits return -x; // -x^-1 mod 2^32 } __kernel u32 addmul_1(u32 *z, u32 *x, u32 y,int N) { int i; u32 cy; u64 prod; cy = 0; //#pragma unroll for(i=0; i < (N+1); ++i) { prod = (u64)x*(u64)y; prod += cy; prod += z; z = (u32)prod; cy = (u32)(prod>>32); } return cy; } __kernel void redc(u32 *z, u32 *x, u32 *n, const u32 d, u32 *t,int N) { int i,j; u32 m; for(i=0; i < (N+1); ++i) t = x; t[(N+1)] = 0; for(i=0; i < (N+1); ++i) { m = t[0]*d; t[(N+1)] = addmul_1(t, n, m,N); for(j=0; j < (N+1); ++j) // shift t = t[j+1]; t[(N+1)] = 0; } for(i=0; i < (N+1); ++i) z = t; } __kernel void mulredc(u32 *z, u32 *x, u32 *y, u32 *n, const u32 d, u32 *t,int N) { int i,j; u32 m, cy; u64 aux; #pragma unroll for(i=0; i < (N+3); ++i) t = 0; #pragma unroll for(i=0; i < (N+1); ++i) { cy = addmul_1(t, x, y,N); // t += x*y aux = (u64)t[(N+1)] + cy; t[(N+1)] = aux; t[(N+2)] = aux>>32; m = t[0]*d; cy = addmul_1(t, n, m,N); aux = (u64)t[(N+1)] + cy; t[(N+1)] = aux; t[(N+2)] = aux>>32; for(j=0; j <= (N+1); ++j) // shift t = t[j+1]; } for(i=0; i < (N+1); ++i) z = t; } __kernel void sqrredc(u32 *z, u32 *x, u32 *n, const u32 d, u32 *t,int N) { int i,j; u32 m, cy; u64 aux, cc; for(i=0; i < (N+3); ++i) t = 0; //#pragma unroll for(i=0; i < (N+1); ++i) { u64 pr; cc = 0; cy = 0; pr = (u64)x*(u64)x; pr += t; pr += cy; t = (u32)pr; cy = (pr>>32); cc = cy; for(j=i+1; j < (N+1); ++j) { u64 ec; pr = (u64)x*(u64)x; ec = pr>>63; pr <<= 1; aux = pr + t; ec += aux<pr; pr = aux; aux = pr + cc; ec += aux<pr; pr = aux; t = (u32)pr; cc = (pr>>32)|(ec<<32); } aux = (u64)t[(N+1)] + cc; t[(N+1)] = aux; t[(N+2)] = aux>>32; m = t[0]*d; cy = addmul_1(t, n, m,N); aux = (u64)t[(N+1)] + cy; t[(N+1)] = aux; t[(N+2)] = aux>>32; for(j=0; j <= (N+1); ++j) // shift t = t[j+1]; } for(i=0; i < (N+1); ++i) z = t; }

kang13 · ‎05-14-2010

May I ask where you find these code? they looks like just some alrithmetic functions, not complete RSA implementation.

Originally posted by: iLoop Hi

Im trying to use OpenSSL to generate a RSA key pair like this:

RSA *rsa=NULL;

int RSAKeySize = 1024; ( like to be able to use 2048 and 4096 as well )

rsa = RSA_generate_key(RSAKeySize, 65537, NULL, NULL);

Then i need to send the rsa->n and rsa->e values to the GPU and use the values in encryption a message m ( m^e mod n).

I have found some samples as CUDA code put i hope someone is able to help me put it all together.

What i hope for is to be able to run this RSA encryption is parallel to speed up async encryption.

Archives Discussions

Modular multiplication using OpenCL