UPDATE #2: Password cracking, the new use for high speed GPUs

  • Moscow (Russia) - It may not be the most popular thing to consider, but high-end graphics cards contain a very powerful internal computing engine, called the GPU.  This massively parallel device can attack a problem in parallel, rather than serial as most CPUs are required to process data.  This means it can compute many hundreds of simultaneous calculations.  This is actually how 3D graphics cards get their high-speed gaming abilities.  Still, a new use has been found for this robust computing engine:  password cracking.

    Elcomsoft, based out of Moscow, has filed a patent for using a GPU to crack passwords.  The company has demonstrated that by using a high-end NVIDIA-based GeForce 8800 Ultra (about $620), the company was able to increase its password cracking prowess by a factor of 25.  Even using $150 GPU cards greatly decreased comput time.  This means that whereas it might've taken 25 days previously to brute-force crack a password, the exact same machine with only a single 8800 Ultra could do it in a day.  This trend allows for passwords which could've taken two years previously to now be broken in only two weeks with only two cards running at 100%.  And one week with four on two machines.

    Today's high-end graphics cards carry with them about 500 Gigaflops of computing power per GPU.  Modern day link technology, like NVIDIA's SLI or ATI's CrossFire, allow for two or three of these cards to be linked together to increase computing capacity to over 1.5 teraflops.  To put that number in perspective, the entire theoretical computational capacity of National Center for Supercomputing Applications (NCSA) in Urbana, Illinois, is 163 teraflops, though the actual real-world number is only about 96 teraflops.  Their most powerful machine is capable of only 60 teraflops.  To put this in perspective, for a cash outlay of less than $70,000, a person could equip themselves with enough high-end graphics cards to compute at a sustained rate in excess of 90 teraflops.  Of course this would only be on certain tasks, but it is possible.

    The technology which drives the massively parallel computing engine inside of a GPU has been used for low-level 3D and gaming engines for some time.  Popular models like OpenGL and DirectX actually take the user-programmable code and convert it to the GPU's internal computing language.  These outward standards allow the peculiarities of the graphics engine to be hidden from programmers, thereby making a once-programmed 3D game or application run on many different graphics card platforms.

    The same kind of technology is now being added to math libraries.  These math libraries extend the base abilities of common programming languages like C or C++.  They allow a user to create a custom program which can utilize the massively parallel computing abilities of a graphics card, but in a way that is not overtly complex to code nor specific to the model of the machine.  One such example is NVIDIA's CUDA software development platform, another is ATI's Close-To-Metal, or CTM.  Both of these allow a programmer to use the compute engines of the GPU cards in the system to carry out regular data processing, and not just graphics manipulation.

    Some concerns will likely be raised about the strength of current cipher keys.  Whereas with previous computational abilities it would've taken an entity like the NCSA a few hours or days to crack a heavily encrypted password, with regular machines taking years, now basically anybody with a half dozen high-end graphics cards can basically do the same thing in close to the same time.  In short, what only governments used to only be able to do, regular people are now able to do.

    UPDATE:  The advantages of the massively parallel GPU are now available as a generic computing library.  Anyone with a NVIDIA graphics card and, currently only, Linux operating system, can write code which will utilize the GPU for non-graphical data processing.  This means the computation is not sent to the graphics card signal output, but rather to main memory.  This allows a program to send input, compute something, receive output, and all without having to utilize the GPU's video buffer, per se.  Specialized adaptations of the graphics card, such as NVIDIA's Tesla, allows for a complete computational engine, one which does not have any graphical output ability at all.  The physical video adapter port is missing from these specialized cards.

    The end result is a software resource which can be integrated into normal software programming by an average developer.  The parallel computing abilities of the GPU are exposed to the developer through normal API calls.  The CUDA or CTM libraries handle the dispatch and return, and all a programmer has to worry about is feeding it the right data in the right order.  For more information, visit NVIDIA's CUDA website, as one resource.

    UPDATE #2:  A commenter asked why the GPU is faster than the CPU.  The CPU is the central processing unit.  It's designed to handle a robust quantity of operations.  It is called a general CPU because it can handle general forms of processing.  The GPU, or Graphics Processing Unit, is a highly specialized form of CPU.  It is designed for maximum parallel throughput, meaning the core is not highly specialized for general processing, but rather highly specialized for specific kinds of processing.

    Whereas a multi-core CPU today might have two or four cores, GPUs consist of scores of internal processing units.  NVIDIA's GeForce 8800 GTX has 128 1.35 GHz internal thread processing units.  These cores are independently capable of carrying out a limited set of computational abilities.  They are designed to process data very fast for one purpose:  graphics-related algorithms.  However, because they process so much data in parallel at high speed, they can be employed to process other types of workloads even if they are less efficient.

    It could loosely be described as using a laser cutting machine to make end-wrenches.  While it is possible to do it, it's not the most efficient use of the tool.  However, when the tool is sitting there otherwise unused, and certain applications can take advantage of it, a significnat speedup potential can be had.