Quietly but surely, we are heading into a new computing era that will bring one of the most dramatic changes the IT industry has seen. Acceleration technologies will inject lots of horsepower into the CPU, increasing the performance capability of the microprocessor not just by 10 or 20%, but in some cases by up to 100x. These new technologies, which are expected to become widely available through heterogeneous multi-core processors, create challenges for software developers – but Intel claims to have found a way to make the transition easy.
Accelerators, which most commonly provide additional floating point capability, have been discussed for some time. Most recently, ATI (now AMD) released its stream processor card and Nvidia is leveraging its GeForce 8 to make its graphics cards available to general purpose processing. Both AMD and Intel are working on CPUs that will integrate graphics cores as well as other “accelerators” in the future. While this new type of integrated processors will open the door to much more demanding applications that will include physics processing and simulation, they will bring a whole new set of requirements to programmers. Knowledge of multi-threaded programming, expert knowledge to fine-tune threading and knowledge to exploit the hidden capability of hardware appears to require a whole new approach on how to develop applications – in some way a whole new generation of programmers.
Intel’s recent disclosure that it is working on a pain-free multi-thread programming model has prompted us to dig deeper. Is there a secret sauce to make the horsepower of future integrated heterogeneous processors available to every developer – without asking for specialty knowledge?
Let’s have a closer look.
What developers are dealing with today
The multi-thread waters of programming are very troubled, it’s as simple as that. Much of the problem stems not from hardware, but rather from an inefficient and very difficult to use software model of that hardware. Without diving into very low-level forms of code, which are costly, time consuming and prone to error, it is difficult to efficiently create a multi-threaded environment. This reality is becoming ever more pronounced as we begin the migration from homogeneous to heterogeneous programming where non-x86-based processors are being called upon to do work in parallel.
The solutions using today's hardware are often extremely difficult to code or coordinate efficiently, requiring special drivers and tools. One scenario utilizes the GPU for parallel processing, in which the operating system (OS) calls for special drivers and runtime code packages must be linked to the application targeted for acceleration.
In such an environment, we have to pay attention to some serious roadblocks: First, there are many different OS versions of the toolset, which must be distributed. Each of them costs money and keeps the product from hitting all OS platforms. Second, that very reality limits the ability for accelerated parallel processing outside of those supported operating systems. This not only makes the lag time from idea to product often unjustifiable on alternative platforms, but it also makes it very clear that something more is needed to bring this high power to everyone.
We need something to satiate not only the parallel abilities we already have today, but we need to consider the growth curve of the accelerator products we'll have tomorrow. A new software model is required to keep up with current and future hardware advancements. And it needs to be one which operates as globally as possible.
The dual-core lesson
In looking back at the dual-core and quad-core evolutionary steps, we have found that there were two main problems, which halted or hindered early adoption of multi-thread programming. These also limited performance potential and throughput to something much less than the machine itself was capable of.
First, many applications use algorithms which do not work for multi-thread processing. This immediate stumbling block, such that A must be computed before B can be processed, removes any possible advances a multi-thread programming model might ever offer to those applications. However, all applications can, in at least some way, take advantage of parallel processing.
However, the reality is that if multi-thread programming were easier to use and understand, then it might already be incorporated into even those applications which won't see much benefit. This would be true just because those resources are physically there and, in the case of being globally accessible, would be easy enough to use.
Also, and probably much more common, is that no one gave any thought to multi-threading when the software was originally developed. We all were using single-core processors until a few years ago. The goals of developing software might only have been to get it to work. In those cases, no concern was given regarding high efficiency, let alone multiple cores.
In addition, any current thoughts of re-engineering existing software, especially those programs that already work properly, just to gain some performance benefits might not justify the expense of doing so. There are undoubtedly users who would benefit, but if there is no real financial incentive to port a functioning application to multiple cores via a multi-threaded model, then why do it?
But the realization that third-party processors are becoming more and more available is warming up. The inertial mindsets of the past are slowly fading away as benchmark data for multi-thread apps is seen more often. For example, ATI's CTM/Stream Processing and Nvidia's CUDA both show the future wide open with potential due to their massively parallel floating point abilities. When added to existing software encoders, for example, performance increases of several hundred percent are a common sight. And due to recent software libraries, those abilities are now exposed and harnessed by CPU-based software for non-GPU based processing to the general developer. Still, they are highly specialized.
Projects like AMD's Torrenza initiative demonstrate exactly how much the need for high-performance, efficient multi-thread programming for heterogeneous processors will be. The truth is, it's not only becoming a reality, but in order to stay ahead in the game such models will soon be considered a necessity.
Recognizing these limitations has placed a need upon the software community. While we do have the hardware resources available to carry out very efficient multi-thread programming on multi-core architectures, the reality is that the developers themselves need high-skills to make it work. This often places a skillset gap between desires and practical realities. No efficient software solution has been proposed that allows the average higher-end developer to create code that maximizes the use of hardware facilities with a minimal coding expense. Only when this problem is solved, the real benefits we're all hoping to see with multiple cores and parallel processing will become a reality.
Read on the next page: Intel’s solution: As good as it gets?