Hi, I am experience no performance boost when using OpenMP in my code!
First time when I write it I notice x2 performance boost on Phenom 9950 in WVistax64! After I modify my code a lot (add SSE3 support in critical section (FFT for example) more parallelization, but when I test it I notice previous x2 boost!!!!!!!!!!! I develop soft on my laptop with AMD Athlonx2 QL-62 and I experiance perf boost each version !!!!!!!! So why it happen? Probably it is poor mem-cache organization!!!!!!! I use all special techniques like data alignment and compact packing! So why to not implement SERIAL CONNECTION TO MEMORY IN YOUR CPU????????????? In this case you can collaborate with some memory manufacturers to develop for example memory module with 4 or more serial chanels (it can be faster than today parallel!) So each core can access memory at same time!