Tuesday, January 12, 2010
   
Text Size
Latest:

CUDA for Professionals - NVIDIA's Newest Quadro GPUs

CUDA for Professionals - NVIDIA's Newest Quadro GPUs

----------------------------------------------------------------------------
Source :: www.hardwarezone.com® -> Articles @ http://www.hardwarezone.com/articles
Date :: Friday, 30th of January, 2009
URL :: http://www.hardwarezone.com/articles/view.php?cid=3&id=2774
----------------------------------------------------------------------------

CUDA for Professionals - NVIDIA's Newest Quadro GPUs
By : Vincent Chang
Category : Graphics (http://www.hardwarezone.com/articles/cat.php?id=3)

Approved by : Vijay Anand
Approved on : Friday, 30th January, 2009

CUDA for Professionals 

By now, GPU computing should no longer be an unfamiliar term for readers. We had previously taken anintroductory look at a shift towards parallel computing and what this has meant for GPUs. We had alsotalked about NVIDIA's CUDA as used in high-performance computing here. That's not all that one can do with CUDA however. 

NVIDIA themselves has segmented the different uses of CUDA technology along the lines of high-performance, workstation and consumer graphics. Which in the parlance of NVIDIA's branding refers to its Tesla, Quadro and GeForce series. Today, we'll be looking at the workstation category, specifically, the newest members of NVIDIA's Quadro line of workstation graphics cards. These are the GPUs that are used for professional 3D and graphics applications like Maya and AutoCAD in industries from image and video editing to computer-assisted design. 

These are some of the most expensive graphics cards ever to make it to our lab, costing at least US$1999 for the Quadro FX 4800 and Quadro CX. The top model, the FX 5800 is at a whopping US$3499.


While we have dabbled now and then in workstation graphics, the cost of these cards and their specialized nature mean that this doesn't happen often. Common characteristics of these specialized cards include large frame buffers with higher precision hardware, custom firmware and drivers, along with more comprehensive technical support. They are also usually based on the same GPU architecture as the consumer graphics variants, though they will cost significantly more. 

The new Quadros for instance are based on NVIDIA's GTX 200 series of GPUs and have similar or close to the graphical processing power of their consumer graphics cousins (So you can play a game of Crysis after work). They are also CUDA capable, which bodes well given the increasing number of applications for CUDA in these technical fields. While we'll be assessing three different Quadro cards in this article, particular attention should be paid to the Quadro CX, which is uniquely marketing to take advantage of its CUDA capability to improve performance in Adobe Premiere Pro CS4. Before we get into the details, here's a short summary on what you can expect to find on these Quadro cards compared to a similar GTX 200 GPU, the GeForce GTX 260:-

 

The Quadro FX and CX Specifications
Model / SpecsNVIDIA Quadro FX 5800NVIDIA Quadro FX 4800NVIDIA Quadro CXNVIDIA GeForce GTX 260
Processing Cores 240 192 192 192/216
Memory Size 4GB 1.5GB 1.5GB 896MB
Memory Interface 512-bit 384-bit 384-bit 448-bit
Memory Bandwidth 102GB/s 76.8GB/s 76.8GB/s 111.9GB/s
SLI Frame Rendering Support Yes Yes Yes Yes
Molex Power Connectors 6-pin, 8-pin 6-pin 6-pin 6-pin
Rear Outputs 2 x Dual-link DVI, 1  x DisplayPort, Stereo 3D port 1 x Dual-link DVI, 2 x DisplayPort, Stereo 3D port 1 x Dual-link DVI, 2 x DisplayPort, Stereo 3D port 2 x Dual-link DVI, 9-pin mini-DIN
Maximum TDP 189W 150W 150W 182W

 

The Quadro FX 

Our tour of the Quadro GPUs start with the top model, the FX 5800. Based on the same GPU architecture as the GTX 200 series and with the same stream processor count (240) as the GTX 280, it's without a doubt the most powerful single GPU workstation card from NVIDIA and it's reflected in its US$3499 price tag. There are some crucial differences of course, chief being the 4GB DDR3 memory that is found on the FX 5800, four times that of the GTX 280. The memory bandwidth too is lower than the typical GTX 280, but the interface remains at 512-bit wide. 

Just like the GeForce GTX 280 that it is based on, the Quadro FX 5800 requires a 6-pin and a 8-pin power connector.


Four outputs on the FX 5800 include a pair of dual-link DVI, a DisplayPort and a 3D stereo port.

The maximum TDP on the FX 5800 is also lower than the GTX 280, though two power connectors are still required. Rear outputs are changed from the standard GTX 280, with the addition of a DisplayPort and a 3D stereo port for professional stereo display options (such as 3D glasses and 3D displays).

The Quadro FX 4800 meanwhile is the GTX 260 equivalent in the Quadro family, with 192 streamprocessors, meaning it's akin to the original GTX 260 and not the enhanced 216-core version in the consumer segment at the moment. Similar to the Quadro FX 5800, its memory bandwidth is reduced from that on the consumer gaming card though the frame buffer is increased to 1.5GB. It remains costly at US$1999 until you compare it to the FX 5800. 

Only a 6-pin power connector is needed for the Quadro FX 4800, with its lower TDP of 150W.


The Quadro FX 4800 comes with a slightly different set of outputs, with an extra DisplayPort replacing one of the two DVI ports on the FX 5800.

 

The Adobe Connection - Quadro CX

Now that we have seen the new Quadro FX GPUs, let's turn to the odd one out of the bunch, the Quadro CX. What exactly is the Quadro CX? Well, it's actually identical to the Quadro FX 4800, from the choice of rear outputs to the GTX 260 GPU hidden underneath. Memory bandwidth and frame buffer size are also similar. And they will both set you back by US$1999. 

We too were initially confounded by the CX until we spoke to NVIDIA and found out that the Quadro CX is the only Quadro card presently that supports and comes bundled with the CUDA-enabled RapiHD plug-in for Adobe Premiere Pro CS4. And that is basically the main difference. In short, it's a marketing (and application) distinction; there's no new or special hardware involved since CUDA is supported on all the Quadro cards seen here today.

Besides the CX branded on the card, the Quadro CX looks similar to the other two Quadro FX cards.


Just a 6-pin connector is required, as expected from a card similar to the GTX 260 and the Quadro FX 4800.


The rear outputs on the Quadro CX are identical to those found on the FX 4800.

Instead, what's new are the marketing efforts by NVIDIA to make the Quadro CX, the graphics card for Adobe's latest Creative Suite 4 (CS4) which includes popular applications like Photoshop, Premiere Pro and Dreamweaver. Both companies have recently teamed up to bring GPU acceleration to Adobe and this is the result. As NVIDIA states, the CX is 'built for Adobe professionals' and it is this particular user segment that it is targeting. The company has a dedicated website for this "Built for Adobe" message, with the Quadro CX billed as the best GPU for Adobe users. This naturally brings us to the question of what are these features that are meant to improve Adobe? 

It appears that these new features are found mainly in three of the many applications in Adobe CS4. The first is the familiar Adobe Photoshop, where the GPU is harnessed to enable effects like real-time rotation of the images, zooming and panning without any lag. Other 3D accelerated features include brush resizing and preview, 2D and 3D compositing, high-quality anti-aliasing, HDR tone mapping, and color conversion.

You don't really need a Quadro-class card to take advantage of the new hardware accelerated features in Adobe Photoshop CS4. All you need is an OpenGL 2.0 compatible GPU with Shader Model 3.0 and 128MB of memory. For instance, the GeForce 8800 GT card here.

Our own trial with Adobe Photoshop CS4 showed that these additional GPU accelerated effects appear to be done with OpenGL so there's no need to get a Quadro CX if you're just looking to improve Photoshop performance. In fact, according to Adobe , any modern OpenGL 2.0 capable GPU with 128MB of memory should have the prerequisites. To confirm this, we tried it with an ATI Radeon 4670 and a GeForce 8800 GT and the OpenGL acceleration was working just fine. 

Another application which benefits similarly from the new GPU acceleration is Adobe After Effects, which now gets a more responsive and faster working environment with complex effects like depth of field, bilateral blur effects, turbulent noise such as flowing water or waving flags, and cartoon effects animated in a shorter time span thanks to the GPU. 

What the Quadro CX really does bring to the table is access to the RapiHD plug-in for Adobe Media Encoder by Elemental Technologies. This is the same company that released the CUDA-enabled Badaboom Media Converter that does video transcoding using the GPU. RapiHD is the full encoding plug-in for Adobe using the same technology and promises to bring about drastic speedups when it comes to media encoding. 

Elemental Technologies' RapiHD plugin for Adobe Premiere Pro CS4 only appears as a selection once you get to the encoding phase. It's also only present if you have installed the plugin and have a Quadro CX card in your system.

We'll be showing you the results of our little encoding test later on but let's run through how the plug-in works. Basically, those who have bought a Quadro CX will be able to download the plug-in from the vendor. You'll need to install it and it should work once the Adobe Media Encoding starts. 

What we did was to create a project in Adobe Premiere Pro CS4 and do the various edits that we needed. Once we selected the Export to Media option, Adobe Media Encoder will start up to handle this task and once there, you can select and change the default H.264 Blu-ray option to RapiHD. You may have to check your settings again to make sure that they are what you require, like the video resolution, variable bitrate, etc. But that's about it, there are no extra steps after this and the video encoding will proceed with RapiHD taking full advantage of your CUDA-enabled Quadro CX. The plug-in only works with the Quadro CX and is hence the main draw of the CX, especially since that you now know that it is identical to the Quadro FX 4800.

In case you're wondering, you can't obtain this plug-in separately by any other means nor would it work with any other Quadro graphics card. Special keys and registration process are in place to ensure that you can only obtain the plug-in via purchasing and registering the Quadro CX. As mentioned, this is a special bundle between NVIDIA and Adobe for which Elemental Technologies designed the plug-in for them. Read on for our testing results to see if CUDA and the plug-in deliver what they promise.

 

Test Setup 

Our last attempt at testing a workstation graphics card was quite some time back, when we looked at a Sapphire FireGL V7600. Obviously, this Radeon HD 2900 XT class based GPU is a much older card unlike the new Quadros, so to ensure that the benchmark results were comparable, we have maintained the same system configuration (with the various drivers used for the different GPUs) for this article. That meant a modest dual-core system with just 2GB of memory:-

  • Intel Core 2 Duo E6850 (3.00GHz)
  • MSI P35 Platinum
  • 2GB DDR2-800 Kingston HyperX memory
  • Seagate Barracuda 7200.10 SATA hard drive
  • Windows XP Professional with Service Pack 2
  • Sapphire FireGL V7600 (Catalyst 8.44)
  • Quadro FX 5800, 4800, CX (178.46 Quadro driver, with Performance Driver Installer version 2.0.0.6 for Autocad)
  • Zotac GeForce GTX 260 AMP! Edition (ForceWare 180.48)


This time round, we didn't manage to snag a competing ATI FirePro card (based on ATI's latest Radeon HD 4800 series) so we'll be reusing the older benchmark results from the FireGL. Hence, it's not exactly a head-to-head comparison. Rather the older comparisons are here to give you an idea of how much the new technology has progressed over the previous generation. Also, the mid-range system specs would ensure that the performance is really relying upon the GPU. A dual-core 3GHz machine is still plenty fast. 

Given the similarities between the FX 4800 and the GTX 260, we threw in an overclocked Zotac GTX 260 for comparison instead. Besides the benchmarks used previously, we also added Adobe Premiere Pro CS4 since we are interested in seeing how the RapiHD plug-in performed against the others. Here are the benchmarks used in this article:-

  • SPECviewperf 10.0
  • Cinebench 10
  • Cadalyst Labs Benchmark Test 2008
  • Futuremark 3DMark06 (ver 110) 
  • Adobe Premiere Pro CS4

     

    Results - SPECviewperf 10.0 

    If you ever wondered what a workstation class GPU is worth, well the results for SPECviewperf 10.0 spoke a thousand words. All the four workstation cards we tested scored significantly higher than the Zotac GeForce GTX 260 consumer graphics card (which is actually better spec'd than most of the Quadro cards). Even the Sapphire FireGL card, which is a couple of generations behind the GTX 200, was significantly better than the Zotac GeForce GTX 260. This goes to show the kind of difference you can get with performance optimization done on the professional series of graphics cards (which is mostly driver tweaking to be frank, but it does wonders).

    Although the Sapphire was behind the Quadro cards for almost all the tests here, it was not as outmatched as we would have expected given the generation gap. Another interesting result was that the top Quadro FX 5800 did not have much of a performance lead over the other Quadros. With the huge 4GB frame buffer, the Quadro FX 5800 would best shine when workloads call upon this vast memory, drowning out its competition and the other Quadro cards. So it's really designed for a different purpose altogether and thus won't see stark difference in these tests.

     

    Results - CINEBENCH 10 

    CINEBENCH 10, an OpenGL rendering benchmark, put some distance between the Quadro and the other cards but again, the old FireGL card was commendable for its performance. The overclocked GeForce GTX 260 meanwhile came close but ultimately could not keep up with the professional variants.

     

    Results - Cadalyst Systems Benchmark 2008 

    In the Cadalyst Systems 2008 benchmark (version 5.0), which can found here at its website for those interested, we saw the workstation cards again leading against the consumer GPU. Despite its status as the older GPU, the FireGL continued to do decently, even if it was rather comprehensively outclassed this time round. Based on these results, owners of professional cards can be assured that their costly purchases have a much longer useful lifespan than most GPUs.

     

    Results - 3DMark06 (ver 110) 

    We tested 3DMark06 to show you how these cards performed in the typical gaming scenario. Like we said earlier, they are more than capable of running your modern game. The overclocked GeForce GTX 260 finally got ahead but it was close among the NVIDIA cards. 


     

    Results - Adobe Premiere Pro CS4 Encoding

    To test the encoding time for Adobe Premiere Pro CS4, we set up a 1 minute video clip and changed the Export Media settings to H.264 Blu-ray with 29.97 fps, VBR (variable bitrate) and 2-pass encoding. Then we let the application do its job and checked how long it took eventually. On paper, it didn't matter what GPU we were using, since the default Blu-ray encoding on CS4 relied mostly on the CPU. Hence, the slight variations in the results here for almost all the cards. 

    The sole exception is none other than the Quadro CX, which was able to use the RapiHD from Elemental Technologies to tap into the 192 CUDA stream processors on the CX to do the encoding. This lead to a more than 50% speed up in our test, with the CX clearly taking less time to finish. Quality settings were similar for both the RapiHD plug-in and the default Blu-ray encoder in Adobe. 


    This is precisely why NVIDIA is pushing its CUDA technology - the improvements can be huge. In this case, a roughly 50% improvement which could potentially save many hours for larger encoding projects when you extrapolate the above results. For all the promise that NVIDIA was talking about its new Quadro cards in conjunction with CUDA and Adobe, it's this feat that is the most impressive (and unique) so far.

    We also changed our dual-core CPU to a quad-core CPU (Intel Core 2 Extreme QX6850 @3.0GHz) just to check that our original dual-core CPU is not a significant factor and as expected, this change didn't affect the results of the Quadro CX/RapiHD encoding.

     

    Conclusion 

    The whole concept of GPU computing is only now slowly filtering into the mainstream. It's early days yet and most consumers will never be inclined to know what their PCs are doing internally as long as it gets the job done. At the same time, developers are still trying to grasp at the potential performance windfall from utilizing GPU computing and improving their code to take advantage of the raw processing power available on their GPUs. Most current applications of GPU computing are very specialized ones that cannot be easily appreciated by the mass audience. However, though Adobe's GPU accelerated effects are not really tapping into GPU computing so much as playing on the traditional strength of the GPU, it is nevertheless a sign that the GPU is gaining more prominence besides the usual gaming and multimedia related applications. 

    These workstation products may be expensive but for the industries that require them, the time saved from using these cards quickly earns back the premium paid.

    This makes applications like the RapiHD plug-in for Adobe Premiere Pro CS4 so valuable. Not only are these applications walking advertisements for the effectiveness of GPU computing (and CUDA), they have more mainstream appeal compared to CUDA projects for high-performance computing. Unfortunately, while the RapiHD technology (in the form of the Badaboom Media Converter) is available for all CUDA-capable NVIDIA GPUs, the RapiHD plug-in for Adobe Premiere Pro CS4 is only 'bundled' with the Quadro CX. There's no reason why the plug-in won't work with any CUDA GPU so we can only hope that it will find its way into the mainstream in the future. 

    As it stands now, NVIDIA's latest Quadros are a refresh of its professional workstation range of GPUs. That means it will do great for the typical applications that they are designed for. CUDA applications are still few in number but growing, so these GPUs are an investment for that future. Adobe Premiere Pro CS4 users however will have no reason not to choose the Quadro CX now. It's not any more costly than the similar Quadro FX 4800 but it will lead to tremendous savings in time thanks to the RapiHD plug-in. It does seem a shame that this plug-in is not available separately in any form, though we can imagine that making the RapiHD plug-in available would probably lead to competition for the Quadro CX from other less expensive NVIDIA CUDA capable consumer GPUs. Those who value the plug-in for the increased productivity would not mind forking out the price of the Quadro CX and that's something NVIDIA is counting on. After what we have seen, it's something that we can all agree on. 

     

    Our Ratings

    The Quadro FX 5800.


    The Quadro FX 4800.


    The Quadro CX.

     

    Testbed Configuration
    Processor Intel Core 2 Extreme QX6850
    Mainboard MSI P35 Platinum
    Memory 2 x 1GB DDR2-800 Kingston HyperX
    Harddisk Seagate 7200.10 200GB SATA
    Operating System Windows XP Professional w/SP2

    Discussion of article at http://forums.hardwarezone.com.sg/showthread.php?t=2253291