For
several decades now (since Gordon More conjectured his Moore’s
“Law”), the computing industry has benefited from ever-increasing
processor speed. Application developers, therefore, had the
privilege of planning ahead for more powerful application
software with the assurance that the computing power required
will be available. Processor manufacturers could also plan
to keep increasing processor speeds with the assurance that
application developers were ready to use them. In terms of
programmer productivity, the most important implication was
that most programmers could write sequential code; the only
exceptions being those who wrote operating systems or application
software for “supercomputers”.
A
few years ago, this convenient symbiotic relationship began
to unravel. The main “culprit” was limitations of technology.
It was no longer feasible to make processors faster because
it was not possible to handle the heat generated by the faster
circuitry.
Fortunately,
Moore’s Law continues to hold, hence it is possible to keep
packing an increasing number of transistors on a chip. Given
the continued opportunity provided by Moore’s Law and faced
with the heat dissipation problem, the industry stopped increasing
the speed of processors and, instead, started designing chips
with multiple processors. This meant that the industry could
still bring out chips which had increasing computing power
albeit with each processor “core” not getting any faster.
The
downside of this development is that all programmers — and
not just a few select ones — need to write parallel programs
to be able to actually use the computing power of multi-processor
or multi-core chips. Parallel Programming has to go mainstream!
There
are two challenges a programmer faces when designing parallel
programs: ensuring correctness and extracting the maximum
possible performance.
Over
the last few decades, the Computer Science community has addressed
the challenge of correctness fairly well. Besides developing
alternative programming paradigms — shared-memory and message-passing
— programming languages have been designed with appropriate
constructs to help (but not necessarily ensure) absence of
typical parallel-programming bugbears such as deadlocks and
race conditions.
What
is still sorely lacking, however, is any systematic methodology
to improve the performance of programs. Getting the best possible
performance for a given parallel program from the underlying
hardware is still an art, bordering on “black magic”. The
industry is in a dire need to create such a systematic methodology.
The primary obstacle to creating such a systematic methodology
is that we do not yet have a programming model of hardware
at the right level of abstraction. We have the ISA (Instruction
Set Architecture) that is excellent for addressing functionality
but has no information about hardware performance. At the
other end of the spectrum, we have the RTL (Register-Transfer
Level) model of hardware that provides information about hardware
performance but is too detailed (not abstract enough) to be
useful to programmers (at least those who are not experts).
The
industry today is witnessing acceleration in the rate of reduction
of hardware costs, an example being the one teraflops double-precision
performance of Intel’s KNC chip. This has created the exciting
opportunity to bring high-performance computing into the mainstream.
However, to make this happen needs a large number of engineers
who can design correct and efficient parallel programs.
Today we have a fairly good systematic methodology to design
parallel programs that are functionally correct. Programmer
productivity is being further enhanced with the development
of Domain-Specific Languages (DSL’s). This needs to be urgently
complemented with a systematic methodology to enhance performance
on a given target hardware platform. The development of such
a methodology must form one of the core themes of research
for the computing community.
|