I am an experienced C/C++/Assembly programmer and I have been writing a compiler for a parallel programming language that I have been developing for 23 years.  Now that I have a parser well under-way, I am looking at the possiblity of not just supporting AMD64/SSE but the GPGPL method of massive parallelism using Stream processors.  Although the OpenCL tools are interesting, I would prefer to generate Intermediate Code directly.  From what I gather, this would be IL which looks something like x86 assembler.  What documents are there that explain the language/architecture and what SDK am I going to need ?  Any helpful information will be appreciated.  My current development host is Windows XP-32 but my target will be a Phenom II X4 running Windows 7-64.