I guess what I'm trying to ask is what are the limitations of GPU programming with Stream/CAL/Brook+ compared to CPU programming with C or Java?
Can the GPU only do floating point arithmetic or can it do boolean logic / conditional branches too?
Where should I start when programming with GPUs? Should I start with a higher level language like Brook+ (only for ATI cards?) or a lower-level language like OpenCL? Is CAL the interface between programming APIs and the hardware itself like CUDA?
I've only taken an object oriented programming class (Java), a function-based programming class (C), some calculus, and computer science I. Is this enough to delve into GPU programming?
The GPU can do conditionals and supports various data types like integer, double precision FP, etc. However, there are several limitations of GPU programming compared to traditional CPU programs. One key difference is the level of control that you get over memory management on the GPU. GPU kernels do not allow heap memory allocation. They don't support general purpose recursion. The list is long but these 2 come to mind right away.
I would recommend that based on your background, you can start with high-level tools like Brook+. Subsequently, once you get comfortable with GPU programming and want to have more low-level control, you can switch to lower level libraries.
Thank you for your reply!
I found these are some more limitations to the GPU, would you say this is accurate:
-Compared, for example, to traditional floating point accelerators such as the 64-bit floating point (FP64) CSX600 math processor from ClearSpeed that is used in today's supercomputers, current and older GPUs from ATI (and NVIDIA) are running on 32-bit processors with only single-precision data capabilities. 
-Instead of the 64-bit double-precision capability of supercomputers , the second generation of stream processors (the AMD FireStream 9170) is able to handle double-precision data. This is a result of FP32 filtering support contained as part of the requirements of the DirectX 10.1 API. However, the double precision operations (frequently used in supercomputer benchmarks) can achieve only half of the performance in theory compared to single precision operations, the actual figures may be lower, as the GPU do not have full double-precision units implemented.
-Recursive functions are not supported.
-Only bilinear texture filtering is supported; mipmapped textures and anisotropic filtering are not supported at this time.
-Various deviations from the IEEE 754 standard. Denormal numbers and signaling NaNs are not supported; the rounding mode cannot be changed, and the precision of division/square root is slightly lower than single-precision.
-Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
-Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant.
-Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
-The bus bandwidth and latency between the CPU and the GPU may become a bottleneck, which may be alleviated in the future by introducing interconnects with higher bandwidth.
Also I've heard that CPUs are better for serial/sequential task-based workloads, what exactly is a "task" and why can't the GPU do this?
I'm aware the CPU has many instructions to help reduce execution time for various applications. Do GPUs have instructions, or do all the instructions such as those included in SSEx have to be "feed" to the GPU?