Books on parallel programming theory often talk about such weird beasts
like the PRAM model, a hypothetical hardware that would provide the
programmer with a number of processors that is proportional to the input
size of the problem at hand. Modern general purpose computers afford
only a few processing units; four is currently a reasonable number. This
limitation makes the development of highly parallel applications quite
difficult to the average computer user. However, the low cost and the
increasing programmability of graphics processing units, popularly know
as GPUs, is contributing to overcome this difficulty. Presently, the
application developer can have access, for a few hundred dollars, to a
hardware boosting hundreds of processing elements. This brave new world
that is now open to many programmers brings, alongside the incredible
possibilities, also difficulties and challenges. Perhaps, for the first
time since the popularization of computers, it makes sense to open the
compiler books on the final chapters, which talk about very unusual
concepts, such as polyhedral loops, iteration space and Fourier-Motskin
transformations, only to name a few of these chimerical creatures. This
material covers, in a very condensed way, some code generation and
optimization techniques that a compiler would use to produce efficient
code for graphics processing units. Through these techniques, the
compiler writer tries to free the application developer from the
intricacies and subtleties of GPU programming, giving him more freedom
to focus on algorithms instead of micro-optimizations. We will discuss a
little bit of what are GPUs, which applications should target them, how
the compiler sees a GPU program and how the compiler can transform this
program so that it will take more from this very powerful hardware.