Is Radeon SIMT device? Does it perform single instuction for multiple threads like NVIDIA? Or is it use other technology?
Yes, the radeon 4xxx and 5xxx (and probably more) cards perform the same code instructions on different threads in lock-step, just like the NVidia cards do.
Furthermore, when you use the Float4 data type in your code, each thread can itself be causing its 5-wide VLIW processor to be executing 4 floating point ops at the same time within the same thread.
I don't think SIMT (Single Instruction, Multiple Threads) is an appropriate name for what is usually called "SIMT" but this is not a hardware, it's the programming model of CUDA or OpenCL, both AMD and nVidia uses multiple SIMD cores in their GPUs.
Regardless of how nVidia marketing department decides to call their new chip this week the programmable part of GPUs are known as SIMD architectures for years and this haven't changed with most recent GPUs.
If SIMT is only software model, could it be implemented in Larrabee, which has SIMD vector engine just like GPU?
I've read somewhere that Larrabee successor ( Intel MIC ) will have 128 pseudo-threads ( hyperthreading with 4 threads + 16 wide ops ). So it suggests that SIMT compiler will be available.
SIMT model can be implemented on any SIMD hardware. Even for x86 ( with SSE ) it's possible to write SIMT compiler.
OpenCL could be implemented on Larrabee where each workitem is a lane of its vector, the OpenCL for CPUs could also be implemented in this way using SSE, the main difficult in this case is the lack of an efficient way of doing gather/scather.
What is this mythical Larrabee you speak of?
Does it mean that Larrabee and CPU is also SIMT device? If GPU is SIMT. At least NVIDIA is.
I really don't like the term SIMT. It means very little in hardware terms (at a push storing a program counter per lane and recombining could go some way, but even then it's unclear), and doesn't really add a lot even at the programming model level.
NVIDIA's GPUs are SIMD devices. This SIMT thing is all about the programming model. Larrabee/Knights Corner is a SIMT device if you write OpenCL code for it and pack into vectors, and decide that this SIMD via scalar instructions programming approach is called SIMT. I would imagine that that's how Intel has/would implement OpenCL for it.
Over multiple waves/warps it's not even a single instruction. So do we really have a MIMT device?
Retrieving data ...