There is no operating system or equivalent running on the GPU.
There is a DMA controller / Command Processor which reads command packets from a ring buffer in system memory (GART) and passes them to the hardware, similar to the way an intelligent disk controller works, but in general the GPU is only working on one command at a time. That command might be "draw five thousand triangles from this list of vertices", but it's still one command.
It's actually a bit more complicated than that, in the sense that the pipeline in a GPU is much longer than in a CPU, and makes heavy use of delayed-write caches to improve throughput, so you may have more than one command "in the pipe" at the same time, but if you think about it as "one operation at a time" that will make the most sense.
We post documents directly to the X.org site at http://www.x.org/docs/AMD - new docs typically hit that site first then get mirrored back to amd.com periodically. If you look at section 4 of the 5xx acceleration guide and page 8 of the r6xx-r7xx-3d guide (start with page 8) that should give you a pretty good understanding.
Both documents include lists of "PM4 packets" - those are the command packets which get read by the Command Processor and executed by the GPU.