Santa Clara (CA) - In a world dominated by multi-thread desires, and often single-thread limitations, hardware advancements can make the biggest difference in performance. AMD has released a new extension for x86 hoping to address at least part of that. Dubbed SSE5, this newest generation adds power to the x86 by introducing not only a whole new instruction class, but also powerful multiply-accumulate instructions as well. Both of these advancements should deliver notable savings in compute time.
SSE history
SSE5 stands for "Streaming SIMD Extensions version 5". SIMD is a type of compute philosophy which greatly differs from the rest the x86 engine - when x86 was first created, its primary goal was to take something and compute it. It operated off of what's called the SISD model, which means "Single Instruction, Single Data". The computer would execute one instruction on one piece of data. SIMD extended that model by allowing a single instruction to compute on more than one piece of data at the same time. It does this in parallel, allowing 2, 4, 8 or 16 computations to be carried out where only one was possible before.
As you might guess, SIMD stands for "Single Instruction Multiple Data". It relates to the concept of packed values. SIMD supports a wide range of data types. They can be viewed logically like this.
SIMD was first introduced for integers only with MMX. It was then extended to 32-bit floating points with SSE. SSE2 brought 64-bit floating point abilities and more parallel 32-bit operations. SSE3, SSSE3 and SSE4 all brought additional and/or wider compute abilities.
The entire SIMD engine today is very wide and capable. Operands include integer values of 8, 16, 32, 64 and 128 bits. These relate to 16, 8, 4, 2 and 1 one simultaneous parallel operation respectively. For floating point they are either 32-bits or 64-bits, allowing for 4 or 2 operations, respectively:
The concept of horizontal instructions was also added with SSE3:
The Floating Point Unit (FPU) of the x86 architecture also allows for 80-bit floating point values and is "almost" fully IEEE-754 compliant: The FPU tries to maintain additional accuracy by not rounding values internally until data is stored. While this might actually be desirable for true computed numbers, it does not behave predictably when compared to other architectures that are fully IEEE-754 compliant. This reality has forced compiler writers to introduce flags which, on the x86, will go through otherwise unnecessary steps on other architectures, to store and re-load values in the middle of computations to ensure rounding is correct.
The SIMD engines present in MMX, SSE/2/3/4/5 supported some different design goals than those of the FPU. This reality of computation is that often times results are in overflow and underflow conditions where the exact result cannot be stored. The concept of different wrap-around modes therefore was introduced to be able to wrap or saturate the result with its maximum or minimum value when overflow or underflow occurs. Three ways to handle overflow operations were introduced: wrap-around, signed saturation and unsigned saturation.
For example, if two 8-bit values of 200 + 200 were added together, the result would be 400. That's too big to fit in a single 8-bit destination which can only hold a maximum of 255. So the SIMD saturation engine would kick in and store the maximum allowable result of 255. The rest of the x86 engine would handle this addition of 200 + 200 differently. It would set the overflow flag and only store the last 8 bits. Saturation allows very fast parallel compute operation, but it is not accurate. For operations that may saturate in this way, the next largest operand must be used (such as using 16-bits instead of 8-bits for these computations).
SSE5 philosophy
Both AMD and Intel are looking primarily at future software needs when they consider which way to move with hardware advancements. The recognition that future software will benefit from parallel operations is an absolutely paramount realization. AMD is looking at compute-intensive, multi-media and security applications with SSE5. It is are targeting a wide industry adoption through many software vendors. And full tool support is expected to be available in 2008, including a fully-supported GCC compiler.
Read on the next page: What you can expect from SSE5




